Research collection on agent memory architectures, persistence patterns, and output quality maintenance for LLM-based agent systems.
If you reference this repoβs summaries/analyses in academic or professional work, please cite:
@misc{lin_agentic_memory_2026,
author = {Leonard Lin},
title = {agentic-memory: Agentic Memory Research Collection (Summaries and Analyses)},
year = {2026},
howpublished = {GitHub repository},
url = {https://github.com/lhl/agentic-memory},
}| Document | Author | Description |
|---|---|---|
| jumperz-agent-memory-stack | @jumperz | 31-piece memory architecture split across 3 phases (Core β Reliability β Intelligence). Complete prompt/spec breakdowns for write pipeline, read pipeline, decay, knowledge graph, episodic memory, trust scoring, echo/fizzle feedback loops. The foundational reference that others build on. |
| joelhooks-adr-0077-memory-system-next-phase | @joelhooks | ADR for joelclaw (personal AI Mac Mini). Maps existing production system (~6 days running, Qdrant 1,343 points) against jumperz's 31 pieces. Plans 3 increments: retrieval quality (score decay, query rewriting), storage quality (dedup, nightly maintenance), feedback loop (echo/fizzle). Includes detailed gap analysis. |
| coolmanns-openclaw-memory-architecture | coolmanns | 12-layer production memory stack for OpenClaw with 14 agents. SQLite+FTS5 knowledge graph (3,108 facts), llama.cpp GPU embeddings (768d, 7ms), three runtime plugins (continuity, stability, graph-memory). 100% recall on 60-query benchmark. Includes activation/decay system, domain RAG, session boot sequences. |
| drag88-agent-output-degradation | @drag88 (Aswin) | "Why Your Agent's Output Gets Worse Over Time" β multi-agent convergence problem. 4-tier memory (working β episodic β semantic β procedural). 3-layer enforcement pipeline (YAML regex β Gemini LLM judge β self-learning loop). Core insight: convert expensive runtime LLM checks into free static regex rules over time. |
| versatly-clawvault | Versatly (@drag88) | ClawVault npm CLI tool β structured markdown memory vault with observation pipeline, knowledge graph, session lifecycle (wake/sleep/checkpoint), task/project primitives, Obsidian integration, OpenClaw hooks. 449+ tests. v2.6.1. |
| vstorm-memv | vstorm-co | memv (PyPI: memvee) β Nemori-inspired predict-calibrate extraction + episode segmentation, plus Graphiti-style bi-temporal validity and hybrid retrieval (sqlite-vec + FTS5 + RRF) on SQLite. |
| supermemory | Dhravya Shah / supermemoryai | Supermemory memory-as-a-service API: memory versioning (linked-list chains), typed relationships (updates/extends/derives), static/dynamic profile synthesis, time-based forgetting with reason tracking, multi-model embedding storage. Critical caveat: open-source repo is frontend/SDK only; core engine is proprietary backend at api.supermemory.ai. |
| Document | Author | Description |
|---|---|---|
| hu-evermembench | Hu et al. | EverMemBench benchmark for >1M-token multi-party, multi-group interleaved conversations; diagnoses multi-hop collapse, temporal/versioning difficulty, and retrieval-bottlenecked βmemory awarenessβ. |
| zhang-live-evo | Zhang et al. | Live-Evo: online self-evolving agent memory with an experience bank + meta-guideline bank, contrastive βmemory-on vs memory-offβ feedback, and weight-based reinforcement/forgetting; evaluated on Prophet Arena + deep research (as reported). |
| shutova-structmemeval | Shutova et al. | StructMemEval benchmark for whether agents can organize memory into useful structures (trees/ledgers/state tracking), not just retrieve facts; includes hint vs no-hint evaluation to isolate βstructure recognitionβ failures. |
| yan-gam | Yan et al. | GAM: just-in-time agent memory via lightweight memos + a universal page-store, plus a deep-research researcher that plans/searches/integrates/reflects over history to compile optimized context at runtime; strong long-context QA gains with higher latency (as reported). |
| yang-graph-based-agent-memory-taxonomy | Yang et al. | Graph-based Agent Memory survey: graph-centric taxonomy + lifecycle (extract/store/retrieve/evolve), storage structures (KG/temporal/hyper/hierarchical/hybrid), retrieval operators, evolution/maintenance, and resources/benchmarks; useful shared vocabulary for shisad. |
| zhang-survey-memory-mechanism | Zhang et al. | Survey on memory mechanisms for LLM agents: definitions, why memory, design axes (sources/forms/ops), evaluation approaches, and application domains; good baseline checklist alongside newer benchmarks/systems. |
| hu-memory-age-ai-agents | Hu et al. | Memory in the Age of AI Agents survey: proposes unified lenses of forms (token/parametric/latent), functions (factual/experiential/working), and dynamics (formation/evolution/retrieval), plus benchmarks/frameworks and trustworthiness frontiers. |
| li-locomoplus | Li et al. | LoCoMo-Plus: evaluates beyond-factual βcognitive memoryβ (latent constraints like state/goals/values) under cueβtrigger semantic disconnect, using constraint-consistency + LLM-judge evaluation. |
| maharana-locomo | Maharana et al. | LoCoMo dataset + benchmark for very long-term multi-session conversations (300 turns, multimodal) grounded in personas + temporal event graphs; evaluates QA + event summarization + multimodal generation. |
| wu-longmemeval | Wu et al. | LongMemEval benchmark + design decomposition (indexing β retrieval β reading) and system optimizations (value granularity, key expansion, time-aware query expansion). |
| packer-memgpt | Packer et al. | MemGPT: OS-inspired hierarchical memory + paging between a fixed-context LLM prompt and external stores (recall + archival), with function-call memory ops and event-driven control flow; foundational baseline for external agent memory. |
| chhikara-mem0 | Chhikara et al. | Mem0: production-oriented long-term memory pipeline with explicit ops (ADD/UPDATE/DELETE/NOOP) and an optional graph memory variant; reports quality + token/latency tradeoffs on LoCoMo. |
| liu-simplemem | Liu et al. | SimpleMem: write-time semantic structured compression + online synthesis + intent-aware retrieval planning (multi-view dense/BM25/symbolic retrieval with union+dedup) to improve LoCoMo/LongMemEval quality while cutting token cost (as reported). |
| xu-a-mem | Xu et al. | AβMem: Zettelkasten-inspired note network with LLM-driven link generation and βmemory evolutionβ (updating older note attributes as new evidence arrives); strong LoCoMo multi-hop/temporal gains with far lower token lengths than full-context (as reported). |
| salama-meminsight | Salama et al. | MemInsight: autonomous memory augmentation that mines/annotates attributes (entity-centric + conversation-centric; turn/session granularity) and uses attribute-guided retrieval; large LoCoMo retrieval recall gains vs DPR RAG baseline (as reported). |
| rasmussen-zep | Rasmussen et al. | Zep: production memory layer built on Graphiti, a bi-temporal knowledge graph (episodes β entities/facts β communities) with validity intervals and invalidation-based corrections; evaluated on DMR + LongMemEval. |
| nan-nemori | Nan et al. | Nemori: cognitively-inspired self-organizing agent memory with semantic episode boundary detection + episodic narratives and a predict-calibrate loop that distills semantic knowledge from prediction gaps; strong LoCoMo + LongMemEvalS results (as reported). |
| li-memos | Li et al. | MemOS: OS-like memory control plane with MemCube (payload+metadata), lifecycle/scheduling, governance (ACL/TTL/audit), and multi-substrate memory (plaintext/activation/KV/parameter/LoRA). |
| yan-memory-r1 | Yan et al. | Memory-R1: reinforcement-learned memory manager (ADD/UPDATE/DELETE/NOOP) + answer agent with learned memory distillation; data-efficient RL (PPO/GRPO) training with exact-match reward. |
| jonelagadda-mnemosyne | Jonelagadda et al. | Mnemosyne: edge-friendly graph memory with substance/redundancy filters, probabilistic recall with decay/refresh, and a fixed-budget βcore summaryβ for persona-level context. |
| patel-engram | Patel et al. | ENGRAM: lightweight typed memory (episodic/semantic/procedural) with simple dense retrieval + strict evidence budgets; strong LoCoMo + LongMemEval results with low token usage. |
| wei-evo-memory | Wei et al. | Evo-Memory: streaming benchmark + framework for self-evolving memory and experience reuse; introduces ExpRAG and ReMem (Think/Act/Refine) baselines and robustness/efficiency metrics. |
| cao-remember-me-refine-me | Cao et al. | ReMe: dynamic procedural memory lifecycle (acquireβreuseβrefine) with multi-faceted distillation from success/failure trajectories, scenario-aware retrieval, and utility-based pruning; strong BFCLβV3/AppWorld results (as reported). |
| sarin-memoria | Sarin et al. | Memoria: personalization memory layer combining session summaries + KG triplets (persona) with exponential recency weighting; SQLite + ChromaDB architecture and LongMemEvals subset results. |
| latimer-hindsight | Latimer et al. | Hindsight: retain/recall/reflect architecture separating evidence vs beliefs vs summaries; temporal+entity memory graph with multi-channel retrieval fusion and belief confidence updates; very strong LongMemEval/LoCoMo results (as reported). |
| yu-agentic-memory | Yu et al. | AgeMem: RL-trained unified LTM+STM controller exposing memory ops as tool actions (add/update/delete/retrieve/summarize/filter) with a 3-stage curriculum and step-wise GRPO for credit assignment. |
| hu-evermemos | Hu et al. | EverMemOS: self-organizing βmemory OSβ with MemCellsβMemScenes lifecycle, user profile consolidation, and necessity/sufficiency-guided recollection (verifier + query rewrite); strong LoCoMo/LongMemEval results (as reported). |
| li-timem | Li et al. | TiMem: temporal-hierarchical memory consolidation (segmentβsessionβdayβweekβprofile) with query-complexity recall planning + gating; strong LoCoMo/LongMemEval-S accuracy with low recalled tokens (as reported). |
| zhang-himem | Zhang et al. | HiMem: hierarchical long-term memory split (Episode Memory + Note Memory) with topic+surprise episode segmentation, note-first βbest-effortβ retrieval w/ sufficiency checks, and conflict-aware reconsolidation; strong LoCoMo results (as reported). |
| behrouz-nested-learning | Behrouz et al. | Nested Learning / CMS / Hope: reframes memory as multi-timescale update dynamics (continuum memory blocks updated at different frequencies) with implications for consolidation and βcorrections without forgettingβ. |
| zhang-recursive-language-models | Zhang et al. | Recursive Language Models (RLMs): inference-time recursion + REPL state treats long prompts as an external environment; processes multiβmillion-token inputs with sub-calls and programmatic slicing, often beating long-context scaffolds at comparable average cost (as reported). |
| wang-m-plus | Wang et al. | M+: latent-space long-term memory extension to MemoryLLM that stores dropped memory tokens in an LTM pool and retrieves them during generation with a co-trained retriever; extends retention to >160k tokens at similar GPU memory cost (as reported). |
| dong-minja | Dong et al. | MINJA: practical memory injection attack on βmemory-as-demonstrationsβ agents via query-only interaction (bridging steps + progressive shortening); motivates write-time gates, isolation, and safer memory representations. |
| sunil-memory-poisoning-attack-defense | Sunil et al. | Memory poisoning attack & defense: empirical MINJA follow-up in EHR agents; shows pre-existing benign memory can reduce ASR, and that trust-score defenses can fail via over-conservatism or overconfidence. |
| anokhin-arigraph | Anokhin et al. | AriGraph: knowledge-graph world model that links episodic observation nodes to extracted semantic triplets; two-stage retrieval (semanticβepisodic) for planning/exploration in text-game environments. |
| behrouz-titans | Behrouz et al. | Titans: long-context architecture with an online-updated neural memory module (test-time learning) plus persistent task memory; provides explicit primitives for surprise-based salience and forgetting. |
| ahn-hema | Ahn | HEMA: hippocampus-inspired dual memory for long conversations (running compact summary + FAISS episodic vector store) with explicit prompt budgeting, pruning (βsemantic forgettingβ), and summary-of-summaries consolidation. |
| tan-membench | Tan et al. | MemBench: benchmark/dataset for agent memory covering participation vs observation scenarios and factual vs reflective memory, with metrics for accuracy/recall/capacity and read/write-time efficiency. |
Root-level critical analyses intended for synthesis work. These reference the summaries above, but focus on coherence, evidence quality, risks, and synthesis-ready claim framing.
| Synthesis | Based on | Focus |
|---|---|---|
| ANALYSIS | ANALYSIS-*.md + shisad docs + Mem0/Letta baselines |
Cross-system comparison (techniques + memory types), plus mapping to shisad and βtraditionalβ RAG-ish memory |
| ANALYSIS-academic-industry | paper ANALYSIS-arxiv-*.md + shisad plan |
Academic/industry synthesis: benchmarks vs systems vs attacks, with βwhatβs missing in shisadβ framing |
| Benchmarks best practices | Public disputes, audits, our analysis | Known pitfalls, metric confusion, dataset quality issues, per-benchmark limitations |
| MELT benchmark design | ANALYSIS.md systems + Reality Check epistemic docs | Memory Evaluation for Lifecycle Testing β session-replay benchmark testing full memory lifecycle (decay, consolidation, contradiction, core stability, inference) at 6 scale tiers over simulated time. Separate repo; draft. |
| Analysis | Based on | Focus |
|---|---|---|
| ANALYSIS-jumperz-agent-memory-stack | references/jumperz-agent-memory-stack.md |
Checklist critique (semantics, failure modes, missing evaluation), synthesis-ready takeaways + claims table |
| ANALYSIS-joelhooks-adr-0077-memory-system-next-phase | references/joelhooks-adr-0077-memory-system-next-phase.md |
Increment plan critique (decay, rewrite, dedup, echo/fizzle), validation plan + claims |
| ANALYSIS-coolmanns-openclaw-memory-architecture | references/coolmanns-openclaw-memory-architecture.md + vendor/openclaw-memory-architecture/ |
Layered stack critique with benchmark-method verification, operational risks, doc drift notes |
| ANALYSIS-drag88-agent-output-degradation | references/drag88-agent-output-degradation.md |
Convergence + enforcement pattern critique (judgeβrule distillation), measurement gaps, risks |
| ANALYSIS-versatly-clawvault | references/versatly-clawvault.md + vendor/clawvault/ |
Product/tooling critique (surface area, hooks, qmd dependency), security posture, missing benchmarks |
| ANALYSIS-vstorm-memv | references/vstorm-memv.md + vendor/memv/ |
Implementation critique of Nemori-inspired predict-calibrate extraction + bi-temporal validity + hybrid retrieval, with gaps/risks and shisad mapping |
| ANALYSIS-openviking | vendor/openviking/ + Hermes provider docs |
Open-source context database: viking:// filesystem, L0/L1/L2 tiered loading, session-commit extraction across 8 memory categories, and hierarchical typed retrieval over memory/resources/skills; strong observability with heavier operational complexity |
| ANALYSIS-byterover-cli | vendor/byterover-cli/ + vendor/byterover-cli/paper/ |
Agent-native coding-agent memory/runtime: daemon + per-project agent pool, markdown context tree with explicit relations and lifecycle, 5-tier progressive retrieval with cache/OOD detection, and strong self-reported benchmarks with caveats |
| ANALYSIS-mira-OSS | vendor/mira-OSS/ |
Full-stack event-driven agent (v1 rev 2): activity-day sigmoid decay, hub discovery + 3-axis linking (vector+entity+TF-IDF), Text-Based LoRA + user model synthesis with critic validation, background forage agent (sub-agent collaboration), portrait synthesis, 16 tools, context overflow remediation, immutable domain models, multi-user RLS + Vault; gaps in write gating, external benchmarks, taint tracking, and sub-agent capability scoping |
| ANALYSIS-claude-code-memory | Source: /home/lhl/Downloads/claude-code/src |
Claude Code memory subsystem (Anthropic): first-party production-scale memory system; flat-file MEMORY.md + typed topic files (user/feedback/project/reference) + background extraction via forked agent with mutual exclusion + LLM-based relevance selection (Sonnet) + team memory with OAuth sync + auto dream consolidation + KAIROS daily-log mode + eval-validated prompts with case IDs + security-hardened path validation; no vector search, no graph, no decay scoring |
| ANALYSIS-codex-memory | openai/codex | Codex memory subsystem (OpenAI): first-party open-source coding agent; two-phase async pipeline (gpt-5.1-codex-mini extraction β gpt-5.3-codex consolidation) + SQLite-backed job coordination (leases/heartbeats/watermarks) + progressive disclosure layout (memory_summary β MEMORY.md β rollout_summaries β skills) + skills as procedural memory + usage-based citation-driven retention + thread-diff incremental forgetting + ~1,400 lines extraction/consolidation prompts; no vector search, no team memory, no real-time extraction |
| ANALYSIS-google-always-on-memory-agent | vendor/always-on-memory-agent/ |
Official Google ADK sample: always-on daemon with multimodal ingestion (27 file types via Gemini 3.1 Flash-Lite), periodic LLM consolidation, SQLite storage, HTTP API + Streamlit dashboard; no retrieval/search (recency scan LIMIT 50), no decay/dedup/versioning; useful as ADK orchestration reference and multimodal ingestion pattern |
| ANALYSIS-supermemory | references/supermemory.md + vendor/supermemory/ |
Memory-as-a-service startup: memory versioning (linked-list chains via parentMemoryId/rootMemoryId/isLatest), typed relationship ontology (updates/extends/derives), static/dynamic profile synthesis API, time-based forgetting with audit trail, multi-model embedding columns, MemoryBench framework; open-source repo is SDK/frontend only β core engine logic is proprietary hosted backend |
| ANALYSIS-karta | vendor/karta/ |
Karta (rohithzr): Rust (~10.4K LOC) agentic memory library with Zettelkasten-inspired knowledge graph, 7-type dream engine (deduction/induction/abduction/consolidation/contradiction/episode digest/cross-episode digest) with inference feedback into retrieval, embedding-based query classification (6 modes), retroactive context evolution with drift protection, cross-encoder reranking with abstention, multi-hop BFS traversal, atomic fact decomposition with per-fact embeddings, foresight signals with TTL, structured episode digests; BEAM 100K: 57.7% with 243-failure root cause catalog |
| Analysis | Based on | Focus |
|---|---|---|
| ANALYSIS-arxiv-2602.01313-evermembench | references/hu-evermembench.md + references/papers/arxiv-2602.01313.pdf |
Benchmark critique emphasizing version semantics, multi-party fragmentation, oracle diagnostics, and shisad mapping |
| ANALYSIS-arxiv-2602.02369-live-evo | references/zhang-live-evo.md + references/papers/arxiv-2602.02369.pdf |
System deep dive emphasizing online experience weighting from continuous feedback, meta-guidelines for memory compilation, and memory-on vs memory-off utility measurement; shisad mapping for feedback loops + procedural memory gating |
| ANALYSIS-arxiv-2602.11243-structmemeval | references/shutova-structmemeval.md + references/papers/arxiv-2602.11243.pdf |
Benchmark deep dive emphasizing memory organization/structure as a distinct capability (trees/ledgers/state), hint vs no-hint diagnostics, and implications for shisad structured-memory primitives |
| ANALYSIS-arxiv-2602.05665-graph-based-agent-memory-taxonomy | references/yang-graph-based-agent-memory-taxonomy.md + references/papers/arxiv-2602.05665.pdf |
Survey deep dive providing graph-based memory taxonomy and lifecycle (extract/store/retrieve/evolve), with implications for shisad graph-as-derived-view, operator choices, and maintenance jobs |
| ANALYSIS-arxiv-2404.13501-survey-memory-mechanism | references/zhang-survey-memory-mechanism.md + references/papers/arxiv-2404.13501.pdf |
Survey deep dive providing baseline taxonomy and evaluation checklists for agent memory; useful coverage reference alongside newer benchmarks/systems for shisadβs roadmap |
| ANALYSIS-arxiv-2512.13564-memory-age-ai-agents | references/hu-memory-age-ai-agents.md + references/papers/arxiv-2512.13564.pdf |
Survey deep dive emphasizing the FormsβFunctionsβDynamics taxonomy and frontiers (RL integration, multimodal, multi-agent shared memory, trustworthiness), used as organizing frame for shisad v0.7 memory roadmap |
| ANALYSIS-arxiv-2402.17753-locomo | references/maharana-locomo.md + references/papers/arxiv-2402.17753.pdf |
Dataset/benchmark critique with episodic-memory implications (event graphs, multimodal, RAG harm) and shisad mapping |
| ANALYSIS-arxiv-2410.10813-longmemeval | references/wu-longmemeval.md + references/papers/arxiv-2410.10813.pdf |
Benchmark and system-design decomposition (indexing/retrieval/reading), with mapping to shisad primitives |
| ANALYSIS-arxiv-2310.08560-memgpt | references/packer-memgpt.md + references/papers/arxiv-2310.08560.pdf |
System deep dive emphasizing virtual context management (OS paging), memory tiers (working/queue/recall/archival), function-call memory ops, and implications for shisad versioned corrections + write-policy hardening |
| ANALYSIS-arxiv-2602.10715-locomoplus | references/li-locomoplus.md + references/papers/arxiv-2602.10715.pdf |
Beyond-factual βcognitive memoryβ benchmark critique (latent constraints) and implications for safe constraint/procedural memory |
| ANALYSIS-arxiv-2504.19413-mem0 | references/chhikara-mem0.md + references/papers/arxiv-2504.19413.pdf |
System deep dive emphasizing explicit memory ops, graph-memory tradeoffs, deployment metrics (tokens/p95), and shisad mapping (versioned corrections vs delete) |
| ANALYSIS-arxiv-2601.02553-simplemem | references/liu-simplemem.md + references/papers/arxiv-2601.02553.pdf |
System deep dive emphasizing write-time semantic structured compression, online consolidation, and intent-aware multi-view retrieval planning; mapping to shisad βderived vs rawβ memory + retrieval budgeting |
| ANALYSIS-arxiv-2502.12110-a-mem | references/xu-a-mem.md + references/papers/arxiv-2502.12110.pdf |
System deep dive emphasizing Zettelkasten-style notes + LLM-driven linking + memory evolution, with strong multi-hop/temporal LoCoMo gains but high versioning/audit requirements for shisad |
| ANALYSIS-arxiv-2503.21760-meminsight | references/salama-meminsight.md + references/papers/arxiv-2503.21760.pdf |
System deep dive emphasizing autonomous attribute mining/annotation as a derived metadata layer to improve retrieval recall and downstream tasks; mapping to shisad schema constraints + provenance/versioning |
| ANALYSIS-arxiv-2511.18423-gam | references/yan-gam.md + references/papers/arxiv-2511.18423.pdf |
System deep dive emphasizing just-in-time context compilation via memo index + universal page-store and an iterative deep-research researcher; highlights the latency/quality trade-off and mapping to shisad evidence-first episodic storage |
| ANALYSIS-arxiv-2501.13956-zep | references/rasmussen-zep.md + references/papers/arxiv-2501.13956.pdf |
System deep dive emphasizing bi-temporal validity semantics, episodic+semantic+community graph tiers, hybrid retrieval (BM25/embeddings/BFS), and implications for shisad versioned memory |
| ANALYSIS-arxiv-2507.03724-memos | references/li-memos.md + references/papers/arxiv-2507.03724.pdf |
System deep dive emphasizing MemCube metadata, multi-substrate memory (plaintext/KV/parameter), lifecycle/scheduling/governance, and mapping to shisad primitives |
| ANALYSIS-arxiv-2508.19828-memory-r1 | references/yan-memory-r1.md + references/papers/arxiv-2508.19828.pdf |
RL deep dive emphasizing learned memory ops (ADD/UPDATE/DELETE/NOOP) + post-retrieval memory distillation, reward design, and whatβs required to safely adopt this in shisad |
| ANALYSIS-arxiv-2508.03341-nemori | references/nan-nemori.md + references/papers/arxiv-2508.03341.pdf |
System deep dive emphasizing episode segmentation (Two-Step Alignment) + predict-calibrate semantic distillation, reported LoCoMo/LongMemEvalS gains, and implications for shisad write gating + correction semantics |
| ANALYSIS-arxiv-2510.08601-mnemosyne | references/jonelagadda-mnemosyne.md + references/papers/arxiv-2510.08601.pdf |
System deep dive emphasizing edge-first graph memory, redundancy/refresh, probabilistic decay-based recall, and a fixed-budget core/persona summary; includes evaluation-rigor cautions |
| ANALYSIS-arxiv-2511.12960-engram | references/patel-engram.md + references/papers/arxiv-2511.12960.pdf |
System deep dive emphasizing typed memory (episodic/semantic/procedural), deterministic routing/formatting, strict evidence budgets, and strong token/latency results; mapping to shisad primitives |
| ANALYSIS-arxiv-2511.20857-evo-memory | references/wei-evo-memory.md + references/papers/arxiv-2511.20857.pdf |
Benchmark deep dive emphasizing streaming task-sequence evaluation for experience reuse, plus refine/prune mechanisms and metrics (robustness, step efficiency) for shisadβs eval harness |
| ANALYSIS-arxiv-2512.10696-remember-me-refine-me | references/cao-remember-me-refine-me.md + references/papers/arxiv-2512.10696.pdf |
System deep dive emphasizing procedural memory distillation + scenario-aware reuse + utility-based refinement/pruning; mapping to shisad procedural tier + versioned invalidation vs delete |
| ANALYSIS-arxiv-2512.12686-memoria | references/sarin-memoria.md + references/papers/arxiv-2512.12686.pdf |
System deep dive emphasizing persona KG + session summaries with recency-weighted retrieval; highlights missing governance/versioning primitives needed for shisad |
| ANALYSIS-arxiv-2512.12818-hindsight | references/latimer-hindsight.md + references/papers/arxiv-2512.12818.pdf |
System deep dive emphasizing retain/recall/reflect with four-network memory (facts/experiences/observations/beliefs), token-budgeted multi-channel retrieval fusion, and belief confidence updates; key shisad mapping |
| ANALYSIS-arxiv-2601.01885-agentic-memory | references/yu-agentic-memory.md + references/papers/arxiv-2601.01885.pdf |
RL deep dive emphasizing unified LTM+STM memory ops as tool actions, 3-stage training curriculum, step-wise GRPO credit assignment, and implications for shisadβs future learned memory policies |
| ANALYSIS-arxiv-2601.02163-evermemos | references/hu-evermemos.md + references/papers/arxiv-2601.02163.pdf |
System deep dive emphasizing MemCellβMemScene consolidation lifecycle, user profile/foresight, and sufficiency-verified scene-guided retrieval; mapping to shisad consolidation roadmap |
| ANALYSIS-arxiv-2601.02845-timem | references/li-timem.md + references/papers/arxiv-2601.02845.pdf |
System deep dive emphasizing temporal-hierarchical consolidation (TMT), query-complexity recall planning/gating, and the accuracyβtoken frontier; mapping to shisad temporal tiers |
| ANALYSIS-arxiv-2601.06377-himem | references/zhang-himem.md + references/papers/arxiv-2601.06377.pdf |
System deep dive emphasizing Episode Memory + Note Memory hierarchy, note-first βbest-effortβ retrieval w/ sufficiency checks, and conflict-aware reconsolidation; mapping to shisad eventβknowledge tiers + versioned updates |
| ANALYSIS-arxiv-2512.24695-nested-learning | references/behrouz-nested-learning.md + references/papers/arxiv-2512.24695.pdf |
Conceptual deep dive on multi-timescale βcontinuum memoryβ and consolidation dynamics; mapping to shisad tiered memory + versioned corrections |
| ANALYSIS-arxiv-2512.24601-recursive-language-models | references/zhang-recursive-language-models.md + references/papers/arxiv-2512.24601.pdf |
Architecture deep dive emphasizing RLM-style programmatic reading/compilation over arbitrarily long evidence stores (REPL + recursion + sub-calls), with implications for shisad sandboxed compilation traces and cost tail management |
| ANALYSIS-arxiv-2502.00592-m-plus | references/wang-m-plus.md + references/papers/arxiv-2502.00592.pdf |
Architecture deep dive emphasizing latent-space long-term memory tokens + co-trained retrieval for >160k retention, with mapping to shisadβs external evidence-first memory and retrieval diagnostics |
| ANALYSIS-arxiv-2503.03704-minja | references/dong-minja.md + references/papers/arxiv-2503.03704.pdf |
Security deep dive on query-only memory injection attacks; implications for write-policy, provenance/taint, isolation, and βdonβt store demonstrationsβ patterns |
| ANALYSIS-arxiv-2601.05504-memory-poisoning-attack-defense | references/sunil-memory-poisoning-attack-defense.md + references/papers/arxiv-2601.05504.pdf |
Security deep dive emphasizing ISR vs ASR under realistic memory conditions, and why trust-score sanitization can fail; concrete shisad hardening takeaways |
| ANALYSIS-arxiv-2407.04363-arigraph | references/anokhin-arigraph.md + references/papers/arxiv-2407.04363.pdf |
System deep dive emphasizing episodicβsemantic memory linking, graph-structured retrieval for planning/exploration, and implications for shisad episode objects + provenance + correction semantics |
| ANALYSIS-arxiv-2501.00663-titans | references/behrouz-titans.md + references/papers/arxiv-2501.00663.pdf |
Architecture deep dive emphasizing test-time-learning neural memory (surprise/momentum/forgetting), Titans MAC/MAG/MAL variants, and how to translate salience/decay ideas into shisadβs external memory framework |
| ANALYSIS-arxiv-2504.16754-hema | references/ahn-hema.md + references/papers/arxiv-2504.16754.pdf |
System deep dive emphasizing dual memory (summary + vector store), explicit prompt budgeting, pruning/consolidation policies, and evaluation-rigor cautions for shisad adoption |
| ANALYSIS-arxiv-2506.21605-membench | references/tan-membench.md + references/papers/arxiv-2506.21605.pdf |
Benchmark deep dive emphasizing multi-scenario (participant vs observer) and multi-level (factual vs reflective) evaluation, plus latency/capacity metrics and implications for shisad eval harnesses |
| Source | URL |
|---|---|
| @jumperz memory stack thread | https://x.com/jumperz/status/2024841165774717031 |
| @joelhooks ADR tweet | https://x.com/joelhooks/status/2024947701738262773 |
| joelclaw ADR-0077 | https://joelclaw.com/adrs/0077-memory-system-next-phase |
| @drag88 article | https://x.com/drag88/status/2022551759491862974 |
| supermemory docs | https://supermemory.ai/docs |
| supermemory repo | https://github.com/supermemoryai/supermemory |
| mempalace repo | https://github.com/milla-jovovich/mempalace |
| karta repo | https://github.com/rohithzr/karta |
agentic-memory/
βββ README.md β this file
βββ ANALYSIS.md β synthesis + comparison
βββ ANALYSIS-academic-industry.md β academic/industry synthesis
βββ ANALYSIS-jumperz-agent-memory-stack.md
βββ ANALYSIS-joelhooks-adr-0077-memory-system-next-phase.md
βββ ANALYSIS-coolmanns-openclaw-memory-architecture.md
βββ ANALYSIS-drag88-agent-output-degradation.md
βββ ANALYSIS-versatly-clawvault.md
βββ ANALYSIS-vstorm-memv.md
βββ ANALYSIS-mira-OSS.md
βββ ANALYSIS-codex-memory.md
βββ ANALYSIS-google-always-on-memory-agent.md
βββ ANALYSIS-supermemory.md
βββ ANALYSIS-karta.md β Karta: Rust agentic memory library with dream engine
βββ ANALYSIS-mempalace.md β not in ANALYSIS.md (claims-vs-code issues); see REVIEWED.md
βββ REVIEWED.md β triage log (examined but not promoted to ANALYSIS)
βββ PUNCHLIST-academic-industry.md β tracking checklist for paper deep dives
βββ templates/ β templates for paper analyses/summaries
β
βββ references/ β summarized reference docs (markdown w/ frontmatter)
β βββ 1-full-agent-memory-build.jpg β jumperz card 1: memory storage
β βββ 2-feeds-into.jpg β jumperz card 2: memory intelligence
β βββ jumperz-agent-memory-stack.md
β βββ joelhooks-adr-0077-memory-system-next-phase.md
β βββ coolmanns-openclaw-memory-architecture.md
β βββ drag88-agent-output-degradation.md
β βββ versatly-clawvault.md
β βββ hu-evermembench.md
β βββ li-locomoplus.md
β βββ maharana-locomo.md
β βββ wu-longmemeval.md
β βββ chhikara-mem0.md
β βββ papers/ β archived PDFs + text snapshots
β βββ README.md
β βββ arxiv-*.pdf
β βββ arxiv-*.md
β
βββ vendor/ β cloned source repos
βββ mira-OSS/ β github.com/taylorsatula/mira-OSS (snapshot, AGPLv3)
β βββ README.md
β βββ CLAUDE.md β project guide (architecture, patterns, principles)
β βββ main.py β FastAPI entry point
β βββ cns/ β Central Nervous System (conversation orchestration)
β β βββ api/ β FastAPI endpoints (chat, actions, data, health)
β β βββ core/ β Domain models (Continuum, Message, Events)
β β βββ services/ β Orchestrator, subcortical, summary, collapse handler
β β βββ infrastructure/ β Repositories, Valkey cache, unit of work
β βββ lt_memory/ β Long-term memory system
β β βββ scoring_formula.sql β Multi-factor activity-day sigmoid importance scoring
β β βββ models.py β Memory, Entity, ExtractedMemory, link types
β β βββ hybrid_search.py β BM25 + pgvector with RRF
β β βββ proactive.py β Dual-path retrieval (similarity + hub discovery)
β β βββ hub_discovery.py β Entity-driven memory retrieval via pg_trgm
β β βββ processing/ β Extraction, consolidation, entity GC pipelines
β βββ working_memory/ β System prompt composition via trinkets
β βββ tools/ β Self-registering tool framework (11 built-in)
β βββ config/ β Pydantic config + prompt templates
β βββ auth/ β WebAuthn + magic link authentication
β
βββ openclaw-memory-architecture/ β github.com/coolmanns/openclaw-memory-architecture
β βββ README.md
β βββ PROJECT.md
β βββ CHANGELOG.md
β βββ docs/
β β βββ ARCHITECTURE.md β full 12-layer technical reference
β β βββ knowledge-graph.md β graph search pipeline, benchmarks
β β βββ context-optimization.md
β β βββ embedding-setup.md
β β βββ benchmark-process.md
β β βββ benchmark-results.md
β β βββ code-search.md
β β βββ COMPARISON.md
β βββ schema/
β β βββ facts.sql β SQLite schema for knowledge graph
β βββ scripts/ β init, seed, search, ingest, decay, benchmark, telemetry
β βββ templates/ β starter files (active-context, gating-policies, etc.)
β βββ plugin-graph-memory/ β OpenClaw plugin (JS)
β
βββ karta/ β github.com/rohithzr/karta (submodule, MIT)
β βββ Cargo.toml β workspace: karta-core + karta-cli
β βββ crates/
β β βββ karta-core/ β Core engine (~6.7K LOC Rust)
β β βββ src/
β β β βββ note.rs β MemoryNote, Provenance, NoteStatus, AtomicFact, Episode, EpisodeDigest
β β β βββ write.rs β Write path: index, link, evolve, foresight, facts
β β β βββ read.rs β Read path: classify, search, traverse, rerank, synthesize
β β β βββ rerank.rs β Jina/LLM/noop rerankers
β β β βββ dream/ β Dream engine: 7 inference types
β β β βββ store/ β LanceDB + SQLite implementations
β β β βββ llm/ β Provider trait + OpenAI + mock + prompts
β β βββ tests/ β eval, beam_100k, bench_beam (~3.8K LOC)
β βββ findings.md β BEAM 100K detailed failure analysis
β βββ plan.md β Experiment plan targeting 90%+
β
βββ always-on-memory-agent/ β GoogleCloudPlatform/generative-ai (official ADK sample)
β βββ agent.py β ADK multi-agent daemon (ingest/consolidate/query)
β βββ dashboard.py β Streamlit UI
β βββ docs/ β Logo/architecture assets
β
βββ memv/ β github.com/vstorm-co/memv
β βββ README.md
β βββ CHANGELOG.md
β βββ pyproject.toml β PyPI: memvee, v0.1.0
β βββ docs/ β docs site (MkDocs)
β βββ src/
β β βββ memv/ β segmentation, extraction, validity, retrieval, storage
β βββ tests/
β
βββ supermemory/ β github.com/supermemoryai/supermemory (lean subset: schemas, SDK, MCP, arch docs)
β βββ LICENSE
β βββ README.md β provenance + open-source vs hosted-backend split
β βββ packages/
β β βββ validation/ β Zod schemas (data model definitions)
β β β βββ schemas.ts
β β β βββ api.ts
β β βββ lib/
β β β βββ api.ts β reveals backend dependency (api.supermemory.ai)
β β β βββ similarity.ts β client-side cosine sim (visualization only)
β β βββ tools/src/shared/
β β βββ memory-client.ts β SDK client (profile search, prompt formatting)
β βββ apps/mcp/src/
β β βββ server.ts β MCP server (memory/recall/whoAmI tools)
β βββ skills/supermemory/references/
β βββ architecture.md β claimed design (558 lines)
β
βββ clawvault/ β github.com/Versatly/clawvault
βββ README.md
βββ PLAN.md β issue #4: ledger, reflect, replay, archive
βββ CHANGELOG.md
βββ SKILL.md
βββ package.json β npm: clawvault, v2.6.1
βββ src/
β βββ commands/ β archive, context, inject, observe, reflect, replay, wake, sleep, task, project, ...
β βββ observer/ β compressor, reflector, router, session-watcher
β βββ lib/ β vault, memory-graph, ledger, observation-format, session-utils
β βββ cli/
βββ bin/ β CLI entry + command registration modules
βββ hooks/ β OpenClaw hook handler
βββ dashboard/ β web dashboard (vault parser, graph diff)
βββ schemas/
βββ scripts/
βββ templates/
βββ tests/
- Phased build order matters: Core memory first (write/read/decay), reliability second (dedup/maintenance/recovery), intelligence last (graphs/trust/cross-agent). Building out of order amplifies flaws.
- Tiered retrieval: Summary files first (fast, cheap), vector search fallback (thorough, expensive). Don't vector-search everything.
- Score decay:
final_score = relevance Γ exp(-Ξ» Γ days)β recency-weighted relevance is universal across all architectures. - Feedback loops: Echo/fizzle (track which injected memories get used), behavior loops (extract corrections as lessons), learning loops (convert expensive LLM checks into cheap static rules).
- SQLite over hosted vector DBs: At current scales (1K-5K entries), SQLite + FTS5 + local embeddings outperforms hosted solutions on latency, cost, and operational simplicity.
- Multi-agent convergence: Shared memory creates homogenization pressure. Workspace isolation + file routing guards help but don't fully solve it.
- Vault index pattern: Single scannable manifest (one-line descriptions) β load individual entries on demand. One file read instead of N.
