Agentic Memory Research

Research collection on agent memory architectures, persistence patterns, and output quality maintenance for LLM-based agent systems.

Citation

If you reference this repo’s summaries/analyses in academic or professional work, please cite:

@misc{lin_agentic_memory_2026,
  author       = {Leonard Lin},
  title        = {agentic-memory: Agentic Memory Research Collection (Summaries and Analyses)},
  year         = {2026},
  howpublished = {GitHub repository},
  url          = {https://github.com/lhl/agentic-memory},
}

Reference Summaries

Document	Author	Description
jumperz-agent-memory-stack	@jumperz	31-piece memory architecture split across 3 phases (Core → Reliability → Intelligence). Complete prompt/spec breakdowns for write pipeline, read pipeline, decay, knowledge graph, episodic memory, trust scoring, echo/fizzle feedback loops. The foundational reference that others build on.
joelhooks-adr-0077-memory-system-next-phase	@joelhooks	ADR for joelclaw (personal AI Mac Mini). Maps existing production system (~6 days running, Qdrant 1,343 points) against jumperz's 31 pieces. Plans 3 increments: retrieval quality (score decay, query rewriting), storage quality (dedup, nightly maintenance), feedback loop (echo/fizzle). Includes detailed gap analysis.
coolmanns-openclaw-memory-architecture	coolmanns	12-layer production memory stack for OpenClaw with 14 agents. SQLite+FTS5 knowledge graph (3,108 facts), llama.cpp GPU embeddings (768d, 7ms), three runtime plugins (continuity, stability, graph-memory). 100% recall on 60-query benchmark. Includes activation/decay system, domain RAG, session boot sequences.
drag88-agent-output-degradation	@drag88 (Aswin)	"Why Your Agent's Output Gets Worse Over Time" — multi-agent convergence problem. 4-tier memory (working → episodic → semantic → procedural). 3-layer enforcement pipeline (YAML regex → Gemini LLM judge → self-learning loop). Core insight: convert expensive runtime LLM checks into free static regex rules over time.
versatly-clawvault	Versatly (@drag88)	ClawVault npm CLI tool — structured markdown memory vault with observation pipeline, knowledge graph, session lifecycle (wake/sleep/checkpoint), task/project primitives, Obsidian integration, OpenClaw hooks. 449+ tests. v2.6.1.
vstorm-memv	vstorm-co	memv (PyPI: `memvee`) — Nemori-inspired predict-calibrate extraction + episode segmentation, plus Graphiti-style bi-temporal validity and hybrid retrieval (sqlite-vec + FTS5 + RRF) on SQLite.
supermemory	Dhravya Shah / supermemoryai	Supermemory memory-as-a-service API: memory versioning (linked-list chains), typed relationships (`updates`/`extends`/`derives`), static/dynamic profile synthesis, time-based forgetting with reason tracking, multi-model embedding storage. Critical caveat: open-source repo is frontend/SDK only; core engine is proprietary backend at `api.supermemory.ai`.

Paper Reference Summaries (Academic / Industry)

Document	Author	Description
hu-evermembench	Hu et al.	EverMemBench benchmark for >1M-token multi-party, multi-group interleaved conversations; diagnoses multi-hop collapse, temporal/versioning difficulty, and retrieval-bottlenecked “memory awareness”.
zhang-live-evo	Zhang et al.	Live-Evo: online self-evolving agent memory with an experience bank + meta-guideline bank, contrastive “memory-on vs memory-off” feedback, and weight-based reinforcement/forgetting; evaluated on Prophet Arena + deep research (as reported).
shutova-structmemeval	Shutova et al.	StructMemEval benchmark for whether agents can organize memory into useful structures (trees/ledgers/state tracking), not just retrieve facts; includes hint vs no-hint evaluation to isolate “structure recognition” failures.
yan-gam	Yan et al.	GAM: just-in-time agent memory via lightweight memos + a universal page-store, plus a deep-research researcher that plans/searches/integrates/reflects over history to compile optimized context at runtime; strong long-context QA gains with higher latency (as reported).
yang-graph-based-agent-memory-taxonomy	Yang et al.	Graph-based Agent Memory survey: graph-centric taxonomy + lifecycle (extract/store/retrieve/evolve), storage structures (KG/temporal/hyper/hierarchical/hybrid), retrieval operators, evolution/maintenance, and resources/benchmarks; useful shared vocabulary for shisad.
zhang-survey-memory-mechanism	Zhang et al.	Survey on memory mechanisms for LLM agents: definitions, why memory, design axes (sources/forms/ops), evaluation approaches, and application domains; good baseline checklist alongside newer benchmarks/systems.
hu-memory-age-ai-agents	Hu et al.	Memory in the Age of AI Agents survey: proposes unified lenses of forms (token/parametric/latent), functions (factual/experiential/working), and dynamics (formation/evolution/retrieval), plus benchmarks/frameworks and trustworthiness frontiers.
li-locomoplus	Li et al.	LoCoMo-Plus: evaluates beyond-factual “cognitive memory” (latent constraints like state/goals/values) under cue–trigger semantic disconnect, using constraint-consistency + LLM-judge evaluation.
maharana-locomo	Maharana et al.	LoCoMo dataset + benchmark for very long-term multi-session conversations (300 turns, multimodal) grounded in personas + temporal event graphs; evaluates QA + event summarization + multimodal generation.
wu-longmemeval	Wu et al.	LongMemEval benchmark + design decomposition (indexing → retrieval → reading) and system optimizations (value granularity, key expansion, time-aware query expansion).
packer-memgpt	Packer et al.	MemGPT: OS-inspired hierarchical memory + paging between a fixed-context LLM prompt and external stores (recall + archival), with function-call memory ops and event-driven control flow; foundational baseline for external agent memory.
chhikara-mem0	Chhikara et al.	Mem0: production-oriented long-term memory pipeline with explicit ops (ADD/UPDATE/DELETE/NOOP) and an optional graph memory variant; reports quality + token/latency tradeoffs on LoCoMo.
liu-simplemem	Liu et al.	SimpleMem: write-time semantic structured compression + online synthesis + intent-aware retrieval planning (multi-view dense/BM25/symbolic retrieval with union+dedup) to improve LoCoMo/LongMemEval quality while cutting token cost (as reported).
xu-a-mem	Xu et al.	A‑Mem: Zettelkasten-inspired note network with LLM-driven link generation and “memory evolution” (updating older note attributes as new evidence arrives); strong LoCoMo multi-hop/temporal gains with far lower token lengths than full-context (as reported).
salama-meminsight	Salama et al.	MemInsight: autonomous memory augmentation that mines/annotates attributes (entity-centric + conversation-centric; turn/session granularity) and uses attribute-guided retrieval; large LoCoMo retrieval recall gains vs DPR RAG baseline (as reported).
rasmussen-zep	Rasmussen et al.	Zep: production memory layer built on Graphiti, a bi-temporal knowledge graph (episodes → entities/facts → communities) with validity intervals and invalidation-based corrections; evaluated on DMR + LongMemEval.
nan-nemori	Nan et al.	Nemori: cognitively-inspired self-organizing agent memory with semantic episode boundary detection + episodic narratives and a predict-calibrate loop that distills semantic knowledge from prediction gaps; strong LoCoMo + LongMemEvalS results (as reported).
li-memos	Li et al.	MemOS: OS-like memory control plane with MemCube (payload+metadata), lifecycle/scheduling, governance (ACL/TTL/audit), and multi-substrate memory (plaintext/activation/KV/parameter/LoRA).
yan-memory-r1	Yan et al.	Memory-R1: reinforcement-learned memory manager (ADD/UPDATE/DELETE/NOOP) + answer agent with learned memory distillation; data-efficient RL (PPO/GRPO) training with exact-match reward.
jonelagadda-mnemosyne	Jonelagadda et al.	Mnemosyne: edge-friendly graph memory with substance/redundancy filters, probabilistic recall with decay/refresh, and a fixed-budget “core summary” for persona-level context.
patel-engram	Patel et al.	ENGRAM: lightweight typed memory (episodic/semantic/procedural) with simple dense retrieval + strict evidence budgets; strong LoCoMo + LongMemEval results with low token usage.
wei-evo-memory	Wei et al.	Evo-Memory: streaming benchmark + framework for self-evolving memory and experience reuse; introduces ExpRAG and ReMem (Think/Act/Refine) baselines and robustness/efficiency metrics.
cao-remember-me-refine-me	Cao et al.	ReMe: dynamic procedural memory lifecycle (acquire→reuse→refine) with multi-faceted distillation from success/failure trajectories, scenario-aware retrieval, and utility-based pruning; strong BFCL‑V3/AppWorld results (as reported).
sarin-memoria	Sarin et al.	Memoria: personalization memory layer combining session summaries + KG triplets (persona) with exponential recency weighting; SQLite + ChromaDB architecture and LongMemEvals subset results.
latimer-hindsight	Latimer et al.	Hindsight: retain/recall/reflect architecture separating evidence vs beliefs vs summaries; temporal+entity memory graph with multi-channel retrieval fusion and belief confidence updates; very strong LongMemEval/LoCoMo results (as reported).
yu-agentic-memory	Yu et al.	AgeMem: RL-trained unified LTM+STM controller exposing memory ops as tool actions (add/update/delete/retrieve/summarize/filter) with a 3-stage curriculum and step-wise GRPO for credit assignment.
hu-evermemos	Hu et al.	EverMemOS: self-organizing “memory OS” with MemCells→MemScenes lifecycle, user profile consolidation, and necessity/sufficiency-guided recollection (verifier + query rewrite); strong LoCoMo/LongMemEval results (as reported).
li-timem	Li et al.	TiMem: temporal-hierarchical memory consolidation (segment→session→day→week→profile) with query-complexity recall planning + gating; strong LoCoMo/LongMemEval-S accuracy with low recalled tokens (as reported).
zhang-himem	Zhang et al.	HiMem: hierarchical long-term memory split (Episode Memory + Note Memory) with topic+surprise episode segmentation, note-first “best-effort” retrieval w/ sufficiency checks, and conflict-aware reconsolidation; strong LoCoMo results (as reported).
behrouz-nested-learning	Behrouz et al.	Nested Learning / CMS / Hope: reframes memory as multi-timescale update dynamics (continuum memory blocks updated at different frequencies) with implications for consolidation and “corrections without forgetting”.
zhang-recursive-language-models	Zhang et al.	Recursive Language Models (RLMs): inference-time recursion + REPL state treats long prompts as an external environment; processes multi‑million-token inputs with sub-calls and programmatic slicing, often beating long-context scaffolds at comparable average cost (as reported).
wang-m-plus	Wang et al.	M+: latent-space long-term memory extension to MemoryLLM that stores dropped memory tokens in an LTM pool and retrieves them during generation with a co-trained retriever; extends retention to >160k tokens at similar GPU memory cost (as reported).
dong-minja	Dong et al.	MINJA: practical memory injection attack on “memory-as-demonstrations” agents via query-only interaction (bridging steps + progressive shortening); motivates write-time gates, isolation, and safer memory representations.
sunil-memory-poisoning-attack-defense	Sunil et al.	Memory poisoning attack & defense: empirical MINJA follow-up in EHR agents; shows pre-existing benign memory can reduce ASR, and that trust-score defenses can fail via over-conservatism or overconfidence.
anokhin-arigraph	Anokhin et al.	AriGraph: knowledge-graph world model that links episodic observation nodes to extracted semantic triplets; two-stage retrieval (semantic→episodic) for planning/exploration in text-game environments.
behrouz-titans	Behrouz et al.	Titans: long-context architecture with an online-updated neural memory module (test-time learning) plus persistent task memory; provides explicit primitives for surprise-based salience and forgetting.
ahn-hema	Ahn	HEMA: hippocampus-inspired dual memory for long conversations (running compact summary + FAISS episodic vector store) with explicit prompt budgeting, pruning (“semantic forgetting”), and summary-of-summaries consolidation.
tan-membench	Tan et al.	MemBench: benchmark/dataset for agent memory covering participation vs observation scenarios and factual vs reflective memory, with metrics for accuracy/recall/capacity and read/write-time efficiency.

Deep Dive Analyses

Root-level critical analyses intended for synthesis work. These reference the summaries above, but focus on coherence, evidence quality, risks, and synthesis-ready claim framing.

Synthesis	Based on	Focus
ANALYSIS	`ANALYSIS-*.md` + shisad docs + Mem0/Letta baselines	Cross-system comparison (techniques + memory types), plus mapping to shisad and “traditional” RAG-ish memory
ANALYSIS-academic-industry	paper `ANALYSIS-arxiv-*.md` + shisad plan	Academic/industry synthesis: benchmarks vs systems vs attacks, with “what’s missing in shisad” framing
Benchmarks best practices	Public disputes, audits, our analysis	Known pitfalls, metric confusion, dataset quality issues, per-benchmark limitations
MELT benchmark design	ANALYSIS.md systems + Reality Check epistemic docs	Memory Evaluation for Lifecycle Testing — session-replay benchmark testing full memory lifecycle (decay, consolidation, contradiction, core stability, inference) at 6 scale tiers over simulated time. Separate repo; draft.

Analysis	Based on	Focus
ANALYSIS-jumperz-agent-memory-stack	`references/jumperz-agent-memory-stack.md`	Checklist critique (semantics, failure modes, missing evaluation), synthesis-ready takeaways + claims table
ANALYSIS-joelhooks-adr-0077-memory-system-next-phase	`references/joelhooks-adr-0077-memory-system-next-phase.md`	Increment plan critique (decay, rewrite, dedup, echo/fizzle), validation plan + claims
ANALYSIS-coolmanns-openclaw-memory-architecture	`references/coolmanns-openclaw-memory-architecture.md` + `vendor/openclaw-memory-architecture/`	Layered stack critique with benchmark-method verification, operational risks, doc drift notes
ANALYSIS-drag88-agent-output-degradation	`references/drag88-agent-output-degradation.md`	Convergence + enforcement pattern critique (judge→rule distillation), measurement gaps, risks
ANALYSIS-versatly-clawvault	`references/versatly-clawvault.md` + `vendor/clawvault/`	Product/tooling critique (surface area, hooks, qmd dependency), security posture, missing benchmarks
ANALYSIS-vstorm-memv	`references/vstorm-memv.md` + `vendor/memv/`	Implementation critique of Nemori-inspired predict-calibrate extraction + bi-temporal validity + hybrid retrieval, with gaps/risks and shisad mapping
ANALYSIS-openviking	`vendor/openviking/` + Hermes provider docs	Open-source context database: `viking://` filesystem, L0/L1/L2 tiered loading, session-commit extraction across 8 memory categories, and hierarchical typed retrieval over memory/resources/skills; strong observability with heavier operational complexity
ANALYSIS-byterover-cli	`vendor/byterover-cli/` + `vendor/byterover-cli/paper/`	Agent-native coding-agent memory/runtime: daemon + per-project agent pool, markdown context tree with explicit relations and lifecycle, 5-tier progressive retrieval with cache/OOD detection, and strong self-reported benchmarks with caveats
ANALYSIS-mira-OSS	`vendor/mira-OSS/`	Full-stack event-driven agent (v1 rev 2): activity-day sigmoid decay, hub discovery + 3-axis linking (vector+entity+TF-IDF), Text-Based LoRA + user model synthesis with critic validation, background forage agent (sub-agent collaboration), portrait synthesis, 16 tools, context overflow remediation, immutable domain models, multi-user RLS + Vault; gaps in write gating, external benchmarks, taint tracking, and sub-agent capability scoping
ANALYSIS-claude-code-memory	Source: `/home/lhl/Downloads/claude-code/src`	Claude Code memory subsystem (Anthropic): first-party production-scale memory system; flat-file MEMORY.md + typed topic files (user/feedback/project/reference) + background extraction via forked agent with mutual exclusion + LLM-based relevance selection (Sonnet) + team memory with OAuth sync + auto dream consolidation + KAIROS daily-log mode + eval-validated prompts with case IDs + security-hardened path validation; no vector search, no graph, no decay scoring
ANALYSIS-codex-memory	openai/codex	Codex memory subsystem (OpenAI): first-party open-source coding agent; two-phase async pipeline (gpt-5.1-codex-mini extraction → gpt-5.3-codex consolidation) + SQLite-backed job coordination (leases/heartbeats/watermarks) + progressive disclosure layout (memory_summary → MEMORY.md → rollout_summaries → skills) + skills as procedural memory + usage-based citation-driven retention + thread-diff incremental forgetting + ~1,400 lines extraction/consolidation prompts; no vector search, no team memory, no real-time extraction
ANALYSIS-google-always-on-memory-agent	`vendor/always-on-memory-agent/`	Official Google ADK sample: always-on daemon with multimodal ingestion (27 file types via Gemini 3.1 Flash-Lite), periodic LLM consolidation, SQLite storage, HTTP API + Streamlit dashboard; no retrieval/search (recency scan LIMIT 50), no decay/dedup/versioning; useful as ADK orchestration reference and multimodal ingestion pattern
ANALYSIS-supermemory	`references/supermemory.md` + `vendor/supermemory/`	Memory-as-a-service startup: memory versioning (linked-list chains via parentMemoryId/rootMemoryId/isLatest), typed relationship ontology (updates/extends/derives), static/dynamic profile synthesis API, time-based forgetting with audit trail, multi-model embedding columns, MemoryBench framework; open-source repo is SDK/frontend only — core engine logic is proprietary hosted backend
ANALYSIS-karta	`vendor/karta/`	Karta (rohithzr): Rust (~10.4K LOC) agentic memory library with Zettelkasten-inspired knowledge graph, 7-type dream engine (deduction/induction/abduction/consolidation/contradiction/episode digest/cross-episode digest) with inference feedback into retrieval, embedding-based query classification (6 modes), retroactive context evolution with drift protection, cross-encoder reranking with abstention, multi-hop BFS traversal, atomic fact decomposition with per-fact embeddings, foresight signals with TTL, structured episode digests; BEAM 100K: 57.7% with 243-failure root cause catalog

Paper Deep Dive Analyses (Academic / Industry)

Analysis	Based on	Focus
ANALYSIS-arxiv-2602.01313-evermembench	`references/hu-evermembench.md` + `references/papers/arxiv-2602.01313.pdf`	Benchmark critique emphasizing version semantics, multi-party fragmentation, oracle diagnostics, and shisad mapping
ANALYSIS-arxiv-2602.02369-live-evo	`references/zhang-live-evo.md` + `references/papers/arxiv-2602.02369.pdf`	System deep dive emphasizing online experience weighting from continuous feedback, meta-guidelines for memory compilation, and memory-on vs memory-off utility measurement; shisad mapping for feedback loops + procedural memory gating
ANALYSIS-arxiv-2602.11243-structmemeval	`references/shutova-structmemeval.md` + `references/papers/arxiv-2602.11243.pdf`	Benchmark deep dive emphasizing memory organization/structure as a distinct capability (trees/ledgers/state), hint vs no-hint diagnostics, and implications for shisad structured-memory primitives
ANALYSIS-arxiv-2602.05665-graph-based-agent-memory-taxonomy	`references/yang-graph-based-agent-memory-taxonomy.md` + `references/papers/arxiv-2602.05665.pdf`	Survey deep dive providing graph-based memory taxonomy and lifecycle (extract/store/retrieve/evolve), with implications for shisad graph-as-derived-view, operator choices, and maintenance jobs
ANALYSIS-arxiv-2404.13501-survey-memory-mechanism	`references/zhang-survey-memory-mechanism.md` + `references/papers/arxiv-2404.13501.pdf`	Survey deep dive providing baseline taxonomy and evaluation checklists for agent memory; useful coverage reference alongside newer benchmarks/systems for shisad’s roadmap
ANALYSIS-arxiv-2512.13564-memory-age-ai-agents	`references/hu-memory-age-ai-agents.md` + `references/papers/arxiv-2512.13564.pdf`	Survey deep dive emphasizing the Forms–Functions–Dynamics taxonomy and frontiers (RL integration, multimodal, multi-agent shared memory, trustworthiness), used as organizing frame for shisad v0.7 memory roadmap
ANALYSIS-arxiv-2402.17753-locomo	`references/maharana-locomo.md` + `references/papers/arxiv-2402.17753.pdf`	Dataset/benchmark critique with episodic-memory implications (event graphs, multimodal, RAG harm) and shisad mapping
ANALYSIS-arxiv-2410.10813-longmemeval	`references/wu-longmemeval.md` + `references/papers/arxiv-2410.10813.pdf`	Benchmark and system-design decomposition (indexing/retrieval/reading), with mapping to shisad primitives
ANALYSIS-arxiv-2310.08560-memgpt	`references/packer-memgpt.md` + `references/papers/arxiv-2310.08560.pdf`	System deep dive emphasizing virtual context management (OS paging), memory tiers (working/queue/recall/archival), function-call memory ops, and implications for shisad versioned corrections + write-policy hardening
ANALYSIS-arxiv-2602.10715-locomoplus	`references/li-locomoplus.md` + `references/papers/arxiv-2602.10715.pdf`	Beyond-factual “cognitive memory” benchmark critique (latent constraints) and implications for safe constraint/procedural memory
ANALYSIS-arxiv-2504.19413-mem0	`references/chhikara-mem0.md` + `references/papers/arxiv-2504.19413.pdf`	System deep dive emphasizing explicit memory ops, graph-memory tradeoffs, deployment metrics (tokens/p95), and shisad mapping (versioned corrections vs delete)
ANALYSIS-arxiv-2601.02553-simplemem	`references/liu-simplemem.md` + `references/papers/arxiv-2601.02553.pdf`	System deep dive emphasizing write-time semantic structured compression, online consolidation, and intent-aware multi-view retrieval planning; mapping to shisad “derived vs raw” memory + retrieval budgeting
ANALYSIS-arxiv-2502.12110-a-mem	`references/xu-a-mem.md` + `references/papers/arxiv-2502.12110.pdf`	System deep dive emphasizing Zettelkasten-style notes + LLM-driven linking + memory evolution, with strong multi-hop/temporal LoCoMo gains but high versioning/audit requirements for shisad
ANALYSIS-arxiv-2503.21760-meminsight	`references/salama-meminsight.md` + `references/papers/arxiv-2503.21760.pdf`	System deep dive emphasizing autonomous attribute mining/annotation as a derived metadata layer to improve retrieval recall and downstream tasks; mapping to shisad schema constraints + provenance/versioning
ANALYSIS-arxiv-2511.18423-gam	`references/yan-gam.md` + `references/papers/arxiv-2511.18423.pdf`	System deep dive emphasizing just-in-time context compilation via memo index + universal page-store and an iterative deep-research researcher; highlights the latency/quality trade-off and mapping to shisad evidence-first episodic storage
ANALYSIS-arxiv-2501.13956-zep	`references/rasmussen-zep.md` + `references/papers/arxiv-2501.13956.pdf`	System deep dive emphasizing bi-temporal validity semantics, episodic+semantic+community graph tiers, hybrid retrieval (BM25/embeddings/BFS), and implications for shisad versioned memory
ANALYSIS-arxiv-2507.03724-memos	`references/li-memos.md` + `references/papers/arxiv-2507.03724.pdf`	System deep dive emphasizing MemCube metadata, multi-substrate memory (plaintext/KV/parameter), lifecycle/scheduling/governance, and mapping to shisad primitives
ANALYSIS-arxiv-2508.19828-memory-r1	`references/yan-memory-r1.md` + `references/papers/arxiv-2508.19828.pdf`	RL deep dive emphasizing learned memory ops (ADD/UPDATE/DELETE/NOOP) + post-retrieval memory distillation, reward design, and what’s required to safely adopt this in shisad
ANALYSIS-arxiv-2508.03341-nemori	`references/nan-nemori.md` + `references/papers/arxiv-2508.03341.pdf`	System deep dive emphasizing episode segmentation (Two-Step Alignment) + predict-calibrate semantic distillation, reported LoCoMo/LongMemEvalS gains, and implications for shisad write gating + correction semantics
ANALYSIS-arxiv-2510.08601-mnemosyne	`references/jonelagadda-mnemosyne.md` + `references/papers/arxiv-2510.08601.pdf`	System deep dive emphasizing edge-first graph memory, redundancy/refresh, probabilistic decay-based recall, and a fixed-budget core/persona summary; includes evaluation-rigor cautions
ANALYSIS-arxiv-2511.12960-engram	`references/patel-engram.md` + `references/papers/arxiv-2511.12960.pdf`	System deep dive emphasizing typed memory (episodic/semantic/procedural), deterministic routing/formatting, strict evidence budgets, and strong token/latency results; mapping to shisad primitives
ANALYSIS-arxiv-2511.20857-evo-memory	`references/wei-evo-memory.md` + `references/papers/arxiv-2511.20857.pdf`	Benchmark deep dive emphasizing streaming task-sequence evaluation for experience reuse, plus refine/prune mechanisms and metrics (robustness, step efficiency) for shisad’s eval harness
ANALYSIS-arxiv-2512.10696-remember-me-refine-me	`references/cao-remember-me-refine-me.md` + `references/papers/arxiv-2512.10696.pdf`	System deep dive emphasizing procedural memory distillation + scenario-aware reuse + utility-based refinement/pruning; mapping to shisad procedural tier + versioned invalidation vs delete
ANALYSIS-arxiv-2512.12686-memoria	`references/sarin-memoria.md` + `references/papers/arxiv-2512.12686.pdf`	System deep dive emphasizing persona KG + session summaries with recency-weighted retrieval; highlights missing governance/versioning primitives needed for shisad
ANALYSIS-arxiv-2512.12818-hindsight	`references/latimer-hindsight.md` + `references/papers/arxiv-2512.12818.pdf`	System deep dive emphasizing retain/recall/reflect with four-network memory (facts/experiences/observations/beliefs), token-budgeted multi-channel retrieval fusion, and belief confidence updates; key shisad mapping
ANALYSIS-arxiv-2601.01885-agentic-memory	`references/yu-agentic-memory.md` + `references/papers/arxiv-2601.01885.pdf`	RL deep dive emphasizing unified LTM+STM memory ops as tool actions, 3-stage training curriculum, step-wise GRPO credit assignment, and implications for shisad’s future learned memory policies
ANALYSIS-arxiv-2601.02163-evermemos	`references/hu-evermemos.md` + `references/papers/arxiv-2601.02163.pdf`	System deep dive emphasizing MemCell→MemScene consolidation lifecycle, user profile/foresight, and sufficiency-verified scene-guided retrieval; mapping to shisad consolidation roadmap
ANALYSIS-arxiv-2601.02845-timem	`references/li-timem.md` + `references/papers/arxiv-2601.02845.pdf`	System deep dive emphasizing temporal-hierarchical consolidation (TMT), query-complexity recall planning/gating, and the accuracy–token frontier; mapping to shisad temporal tiers
ANALYSIS-arxiv-2601.06377-himem	`references/zhang-himem.md` + `references/papers/arxiv-2601.06377.pdf`	System deep dive emphasizing Episode Memory + Note Memory hierarchy, note-first “best-effort” retrieval w/ sufficiency checks, and conflict-aware reconsolidation; mapping to shisad event→knowledge tiers + versioned updates
ANALYSIS-arxiv-2512.24695-nested-learning	`references/behrouz-nested-learning.md` + `references/papers/arxiv-2512.24695.pdf`	Conceptual deep dive on multi-timescale “continuum memory” and consolidation dynamics; mapping to shisad tiered memory + versioned corrections
ANALYSIS-arxiv-2512.24601-recursive-language-models	`references/zhang-recursive-language-models.md` + `references/papers/arxiv-2512.24601.pdf`	Architecture deep dive emphasizing RLM-style programmatic reading/compilation over arbitrarily long evidence stores (REPL + recursion + sub-calls), with implications for shisad sandboxed compilation traces and cost tail management
ANALYSIS-arxiv-2502.00592-m-plus	`references/wang-m-plus.md` + `references/papers/arxiv-2502.00592.pdf`	Architecture deep dive emphasizing latent-space long-term memory tokens + co-trained retrieval for >160k retention, with mapping to shisad’s external evidence-first memory and retrieval diagnostics
ANALYSIS-arxiv-2503.03704-minja	`references/dong-minja.md` + `references/papers/arxiv-2503.03704.pdf`	Security deep dive on query-only memory injection attacks; implications for write-policy, provenance/taint, isolation, and “don’t store demonstrations” patterns
ANALYSIS-arxiv-2601.05504-memory-poisoning-attack-defense	`references/sunil-memory-poisoning-attack-defense.md` + `references/papers/arxiv-2601.05504.pdf`	Security deep dive emphasizing ISR vs ASR under realistic memory conditions, and why trust-score sanitization can fail; concrete shisad hardening takeaways
ANALYSIS-arxiv-2407.04363-arigraph	`references/anokhin-arigraph.md` + `references/papers/arxiv-2407.04363.pdf`	System deep dive emphasizing episodic↔semantic memory linking, graph-structured retrieval for planning/exploration, and implications for shisad episode objects + provenance + correction semantics
ANALYSIS-arxiv-2501.00663-titans	`references/behrouz-titans.md` + `references/papers/arxiv-2501.00663.pdf`	Architecture deep dive emphasizing test-time-learning neural memory (surprise/momentum/forgetting), Titans MAC/MAG/MAL variants, and how to translate salience/decay ideas into shisad’s external memory framework
ANALYSIS-arxiv-2504.16754-hema	`references/ahn-hema.md` + `references/papers/arxiv-2504.16754.pdf`	System deep dive emphasizing dual memory (summary + vector store), explicit prompt budgeting, pruning/consolidation policies, and evaluation-rigor cautions for shisad adoption
ANALYSIS-arxiv-2506.21605-membench	`references/tan-membench.md` + `references/papers/arxiv-2506.21605.pdf`	Benchmark deep dive emphasizing multi-scenario (participant vs observer) and multi-level (factual vs reflective) evaluation, plus latency/capacity metrics and implications for shisad eval harnesses

Source Threads & Links

Source	URL
@jumperz memory stack thread	https://x.com/jumperz/status/2024841165774717031
@joelhooks ADR tweet	https://x.com/joelhooks/status/2024947701738262773
joelclaw ADR-0077	https://joelclaw.com/adrs/0077-memory-system-next-phase
@drag88 article	https://x.com/drag88/status/2022551759491862974
supermemory docs	https://supermemory.ai/docs
supermemory repo	https://github.com/supermemoryai/supermemory
mempalace repo	https://github.com/milla-jovovich/mempalace
karta repo	https://github.com/rohithzr/karta

File Tree

agentic-memory/
├── README.md                          ← this file
├── ANALYSIS.md                         ← synthesis + comparison
├── ANALYSIS-academic-industry.md       ← academic/industry synthesis
├── ANALYSIS-jumperz-agent-memory-stack.md
├── ANALYSIS-joelhooks-adr-0077-memory-system-next-phase.md
├── ANALYSIS-coolmanns-openclaw-memory-architecture.md
├── ANALYSIS-drag88-agent-output-degradation.md
├── ANALYSIS-versatly-clawvault.md
├── ANALYSIS-vstorm-memv.md
├── ANALYSIS-mira-OSS.md
├── ANALYSIS-codex-memory.md
├── ANALYSIS-google-always-on-memory-agent.md
├── ANALYSIS-supermemory.md
├── ANALYSIS-karta.md               ← Karta: Rust agentic memory library with dream engine
├── ANALYSIS-mempalace.md           ← not in ANALYSIS.md (claims-vs-code issues); see REVIEWED.md
├── REVIEWED.md                        ← triage log (examined but not promoted to ANALYSIS)
├── PUNCHLIST-academic-industry.md     ← tracking checklist for paper deep dives
├── templates/                         ← templates for paper analyses/summaries
│
├── references/                        ← summarized reference docs (markdown w/ frontmatter)
│   ├── 1-full-agent-memory-build.jpg  ← jumperz card 1: memory storage
│   ├── 2-feeds-into.jpg               ← jumperz card 2: memory intelligence
│   ├── jumperz-agent-memory-stack.md
│   ├── joelhooks-adr-0077-memory-system-next-phase.md
│   ├── coolmanns-openclaw-memory-architecture.md
│   ├── drag88-agent-output-degradation.md
│   └── versatly-clawvault.md
│   ├── hu-evermembench.md
│   ├── li-locomoplus.md
│   ├── maharana-locomo.md
│   ├── wu-longmemeval.md
│   ├── chhikara-mem0.md
│   └── papers/                        ← archived PDFs + text snapshots
│       ├── README.md
│       ├── arxiv-*.pdf
│       └── arxiv-*.md
│
└── vendor/                            ← cloned source repos
    ├── mira-OSS/                      ← github.com/taylorsatula/mira-OSS (snapshot, AGPLv3)
    │   ├── README.md
    │   ├── CLAUDE.md                  ← project guide (architecture, patterns, principles)
    │   ├── main.py                    ← FastAPI entry point
    │   ├── cns/                       ← Central Nervous System (conversation orchestration)
    │   │   ├── api/                   ← FastAPI endpoints (chat, actions, data, health)
    │   │   ├── core/                  ← Domain models (Continuum, Message, Events)
    │   │   ├── services/              ← Orchestrator, subcortical, summary, collapse handler
    │   │   └── infrastructure/        ← Repositories, Valkey cache, unit of work
    │   ├── lt_memory/                 ← Long-term memory system
    │   │   ├── scoring_formula.sql    ← Multi-factor activity-day sigmoid importance scoring
    │   │   ├── models.py             ← Memory, Entity, ExtractedMemory, link types
    │   │   ├── hybrid_search.py      ← BM25 + pgvector with RRF
    │   │   ├── proactive.py          ← Dual-path retrieval (similarity + hub discovery)
    │   │   ├── hub_discovery.py      ← Entity-driven memory retrieval via pg_trgm
    │   │   └── processing/           ← Extraction, consolidation, entity GC pipelines
    │   ├── working_memory/           ← System prompt composition via trinkets
    │   ├── tools/                    ← Self-registering tool framework (11 built-in)
    │   ├── config/                   ← Pydantic config + prompt templates
    │   └── auth/                     ← WebAuthn + magic link authentication
    │
    ├── openclaw-memory-architecture/  ← github.com/coolmanns/openclaw-memory-architecture
    │   ├── README.md
    │   ├── PROJECT.md
    │   ├── CHANGELOG.md
    │   ├── docs/
    │   │   ├── ARCHITECTURE.md        ← full 12-layer technical reference
    │   │   ├── knowledge-graph.md     ← graph search pipeline, benchmarks
    │   │   ├── context-optimization.md
    │   │   ├── embedding-setup.md
    │   │   ├── benchmark-process.md
    │   │   ├── benchmark-results.md
    │   │   ├── code-search.md
    │   │   └── COMPARISON.md
    │   ├── schema/
    │   │   └── facts.sql              ← SQLite schema for knowledge graph
    │   ├── scripts/                   ← init, seed, search, ingest, decay, benchmark, telemetry
    │   ├── templates/                 ← starter files (active-context, gating-policies, etc.)
    │   └── plugin-graph-memory/       ← OpenClaw plugin (JS)
    │
    ├── karta/                         ← github.com/rohithzr/karta (submodule, MIT)
    │   ├── Cargo.toml                ← workspace: karta-core + karta-cli
    │   ├── crates/
    │   │   └── karta-core/           ← Core engine (~6.7K LOC Rust)
    │   │       ├── src/
    │   │       │   ├── note.rs       ← MemoryNote, Provenance, NoteStatus, AtomicFact, Episode, EpisodeDigest
    │   │       │   ├── write.rs      ← Write path: index, link, evolve, foresight, facts
    │   │       │   ├── read.rs       ← Read path: classify, search, traverse, rerank, synthesize
    │   │       │   ├── rerank.rs     ← Jina/LLM/noop rerankers
    │   │       │   ├── dream/        ← Dream engine: 7 inference types
    │   │       │   ├── store/        ← LanceDB + SQLite implementations
    │   │       │   └── llm/          ← Provider trait + OpenAI + mock + prompts
    │   │       └── tests/            ← eval, beam_100k, bench_beam (~3.8K LOC)
    │   ├── findings.md               ← BEAM 100K detailed failure analysis
    │   └── plan.md                   ← Experiment plan targeting 90%+
    │
    ├── always-on-memory-agent/        ← GoogleCloudPlatform/generative-ai (official ADK sample)
    │   ├── agent.py                  ← ADK multi-agent daemon (ingest/consolidate/query)
    │   ├── dashboard.py              ← Streamlit UI
    │   └── docs/                     ← Logo/architecture assets
    │
    ├── memv/                          ← github.com/vstorm-co/memv
    │   ├── README.md
    │   ├── CHANGELOG.md
    │   ├── pyproject.toml             ← PyPI: memvee, v0.1.0
    │   ├── docs/                      ← docs site (MkDocs)
    │   ├── src/
    │   │   └── memv/                  ← segmentation, extraction, validity, retrieval, storage
    │   └── tests/
    │
    ├── supermemory/                    ← github.com/supermemoryai/supermemory (lean subset: schemas, SDK, MCP, arch docs)
    │   ├── LICENSE
    │   ├── README.md                  ← provenance + open-source vs hosted-backend split
    │   ├── packages/
    │   │   ├── validation/            ← Zod schemas (data model definitions)
    │   │   │   ├── schemas.ts
    │   │   │   └── api.ts
    │   │   ├── lib/
    │   │   │   ├── api.ts             ← reveals backend dependency (api.supermemory.ai)
    │   │   │   └── similarity.ts      ← client-side cosine sim (visualization only)
    │   │   └── tools/src/shared/
    │   │       └── memory-client.ts   ← SDK client (profile search, prompt formatting)
    │   ├── apps/mcp/src/
    │   │   └── server.ts              ← MCP server (memory/recall/whoAmI tools)
    │   └── skills/supermemory/references/
    │       └── architecture.md        ← claimed design (558 lines)
    │
    └── clawvault/                     ← github.com/Versatly/clawvault
        ├── README.md
        ├── PLAN.md                    ← issue #4: ledger, reflect, replay, archive
        ├── CHANGELOG.md
        ├── SKILL.md
        ├── package.json               ← npm: clawvault, v2.6.1
        ├── src/
        │   ├── commands/              ← archive, context, inject, observe, reflect, replay, wake, sleep, task, project, ...
        │   ├── observer/              ← compressor, reflector, router, session-watcher
        │   ├── lib/                   ← vault, memory-graph, ledger, observation-format, session-utils
        │   └── cli/
        ├── bin/                       ← CLI entry + command registration modules
        ├── hooks/                     ← OpenClaw hook handler
        ├── dashboard/                 ← web dashboard (vault parser, graph diff)
        ├── schemas/
        ├── scripts/
        ├── templates/
        └── tests/

Key Themes Across Sources

Phased build order matters: Core memory first (write/read/decay), reliability second (dedup/maintenance/recovery), intelligence last (graphs/trust/cross-agent). Building out of order amplifies flaws.
Tiered retrieval: Summary files first (fast, cheap), vector search fallback (thorough, expensive). Don't vector-search everything.
Score decay: final_score = relevance × exp(-λ × days) — recency-weighted relevance is universal across all architectures.
Feedback loops: Echo/fizzle (track which injected memories get used), behavior loops (extract corrections as lessons), learning loops (convert expensive LLM checks into cheap static rules).
SQLite over hosted vector DBs: At current scales (1K-5K entries), SQLite + FTS5 + local embeddings outperforms hosted solutions on latency, cost, and operational simplicity.
Multi-agent convergence: Shared memory creates homogenization pressure. Workspace isolation + file routing guards help but don't fully solve it.
Vault index pattern: Single scannable manifest (one-line descriptions) → load individual entries on demand. One file read instead of N.

Version	Changes	Urgency	Date
main@2026-05-09	Latest activity on main branch	High	5/9/2026
0.0.0	No release found — using repo HEAD	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026
main@2026-04-11	Latest activity on main branch	Medium	4/11/2026
main@2026-04-11	Latest activity on main branch	Medium	4/11/2026
main@2026-04-11	Latest activity on main branch	Medium	4/11/2026
main@2026-04-11	Latest activity on main branch	Medium	4/11/2026
main@2026-04-11	Latest activity on main branch	Medium	4/11/2026

agentic-memory

Description

README

Agentic Memory Research

Citation

Reference Summaries

Paper Reference Summaries (Academic / Industry)

Deep Dive Analyses

Paper Deep Dive Analyses (Academic / Industry)

Source Threads & Links

File Tree

Key Themes Across Sources

Release History

Dependencies & License Audit

Similar Packages

More in RAG & Memory