Research collection on agent memory architectures, persistence patterns, and output quality maintenance for LLM-based agent systems.
If you reference this repoโs summaries/analyses in academic or professional work, please cite:
@misc{lin_agentic_memory_2026,
author = {Leonard Lin},
title = {agentic-memory: Agentic Memory Research Collection (Summaries and Analyses)},
year = {2026},
howpublished = {GitHub repository},
url = {https://github.com/lhl/agentic-memory},
}| Document | Author | Description |
|---|---|---|
| jumperz-agent-memory-stack | @jumperz | 31-piece memory architecture split across 3 phases (Core โ Reliability โ Intelligence). Complete prompt/spec breakdowns for write pipeline, read pipeline, decay, knowledge graph, episodic memory, trust scoring, echo/fizzle feedback loops. The foundational reference that others build on. |
| joelhooks-adr-0077-memory-system-next-phase | @joelhooks | ADR for joelclaw (personal AI Mac Mini). Maps existing production system (~6 days running, Qdrant 1,343 points) against jumperz's 31 pieces. Plans 3 increments: retrieval quality (score decay, query rewriting), storage quality (dedup, nightly maintenance), feedback loop (echo/fizzle). Includes detailed gap analysis. |
| coolmanns-openclaw-memory-architecture | coolmanns | 12-layer production memory stack for OpenClaw with 14 agents. SQLite+FTS5 knowledge graph (3,108 facts), llama.cpp GPU embeddings (768d, 7ms), three runtime plugins (continuity, stability, graph-memory). 100% recall on 60-query benchmark. Includes activation/decay system, domain RAG, session boot sequences. |
| drag88-agent-output-degradation | @drag88 (Aswin) | "Why Your Agent's Output Gets Worse Over Time" โ multi-agent convergence problem. 4-tier memory (working โ episodic โ semantic โ procedural). 3-layer enforcement pipeline (YAML regex โ Gemini LLM judge โ self-learning loop). Core insight: convert expensive runtime LLM checks into free static regex rules over time. |
| versatly-clawvault | Versatly (@drag88) | ClawVault npm CLI tool โ structured markdown memory vault with observation pipeline, knowledge graph, session lifecycle (wake/sleep/checkpoint), task/project primitives, Obsidian integration, OpenClaw hooks. 449+ tests. v2.6.1. |
| vstorm-memv | vstorm-co | memv (PyPI: memvee) โ Nemori-inspired predict-calibrate extraction + episode segmentation, plus Graphiti-style bi-temporal validity and hybrid retrieval (sqlite-vec + FTS5 + RRF) on SQLite. |
| supermemory | Dhravya Shah / supermemoryai | Supermemory memory-as-a-service API: memory versioning (linked-list chains), typed relationships (updates/extends/derives), static/dynamic profile synthesis, time-based forgetting with reason tracking, multi-model embedding storage. Critical caveat: open-source repo is frontend/SDK only; core engine is proprietary backend at api.supermemory.ai. |
| Document | Author | Description |
|---|---|---|
| hu-evermembench | Hu et al. | EverMemBench benchmark for >1M-token multi-party, multi-group interleaved conversations; diagnoses multi-hop collapse, temporal/versioning difficulty, and retrieval-bottlenecked โmemory awarenessโ. |
| zhang-live-evo | Zhang et al. | Live-Evo: online self-evolving agent memory with an experience bank + meta-guideline bank, contrastive โmemory-on vs memory-offโ feedback, and weight-based reinforcement/forgetting; evaluated on Prophet Arena + deep research (as reported). |
| shutova-structmemeval | Shutova et al. | StructMemEval benchmark for whether agents can organize memory into useful structures (trees/ledgers/state tracking), not just retrieve facts; includes hint vs no-hint evaluation to isolate โstructure recognitionโ failures. |
| yan-gam | Yan et al. | GAM: just-in-time agent memory via lightweight memos + a universal page-store, plus a deep-research researcher that plans/searches/integrates/reflects over history to compile optimized context at runtime; strong long-context QA gains with higher latency (as reported). |
| yang-graph-based-agent-memory-taxonomy | Yang et al. | Graph-based Agent Memory survey: graph-centric taxonomy + lifecycle (extract/store/retrieve/evolve), storage structures (KG/temporal/hyper/hierarchical/hybrid), retrieval operators, evolution/maintenance, and resources/benchmarks; useful shared vocabulary for shisad. |
| zhang-survey-memory-mechanism | Zhang et al. | Survey on memory mechanisms for LLM agents: definitions, why memory, design axes (sources/forms/ops), evaluation approaches, and application domains; good baseline checklist alongside newer benchmarks/systems. |
| hu-memory-age-ai-agents | Hu et al. | Memory in the Age of AI Agents survey: proposes unified lenses of forms (token/parametric/latent), functions (factual/experiential/working), and dynamics (formation/evolution/retrieval), plus benchmarks/frameworks and trustworthiness frontiers. |
| li-locomoplus | Li et al. | LoCoMo-Plus: evaluates beyond-factual โcognitive memoryโ (latent constraints like state/goals/values) under cueโtrigger semantic disconnect, using constraint-consistency + LLM-judge evaluation. |
| maharana-locomo | Maharana et al. | LoCoMo dataset + benchmark for very long-term multi-session conversations (300 turns, multimodal) grounded in personas + temporal event graphs; evaluates QA + event summarization + multimodal generation. |
| wu-longmemeval | Wu et al. | LongMemEval benchmark + design decomposition (indexing โ retrieval โ reading) and system optimizations (value granularity, key expansion, time-aware query expansion). |
| packer-memgpt | Packer et al. | MemGPT: OS-inspired hierarchical memory + paging between a fixed-context LLM prompt and external stores (recall + archival), with function-call memory ops and event-driven control flow; foundational baseline for external agent memory. |
| chhikara-mem0 | Chhikara et al. | Mem0: production-oriented long-term memory pipeline with explicit ops (ADD/UPDATE/DELETE/NOOP) and an optional graph memory variant; reports quality + token/latency tradeoffs on LoCoMo. |
| liu-simplemem | Liu et al. | SimpleMem: write-time semantic structured compression + online synthesis + intent-aware retrieval planning (multi-view dense/BM25/symbolic retrieval with union+dedup) to improve LoCoMo/LongMemEval quality while cutting token cost (as reported). |
| xu-a-mem | Xu et al. | AโMem: Zettelkasten-inspired note network with LLM-driven link generation and โmemory evolutionโ (updating older note attributes as new evidence arrives); strong LoCoMo multi-hop/temporal gains with far lower token lengths than full-context (as reported). |
| salama-meminsight | Salama et al. | MemInsight: autonomous memory augmentation that mines/annotates attributes (entity-centric + conversation-centric; turn/session granularity) and uses attribute-guided retrieval; large LoCoMo retrieval recall gains vs DPR RAG baseline (as reported). |
| rasmussen-zep | Rasmussen et al. | Zep: production memory layer built on Graphiti, a bi-temporal knowledge graph (episodes โ entities/facts โ communities) with validity intervals and invalidation-based corrections; evaluated on DMR + LongMemEval. |
| nan-nemori | Nan et al. | Nemori: cognitively-inspired self-organizing agent memory with semantic episode boundary detection + episodic narratives and a predict-calibrate loop that distills semantic knowledge from prediction gaps; strong LoCoMo + LongMemEvalS results (as reported). |
| li-memos | Li et al. | MemOS: OS-like memory control plane with MemCube (payload+metadata), lifecycle/scheduling, governance (ACL/TTL/audit), and multi-substrate memory (plaintext/activation/KV/parameter/LoRA). |
| yan-memory-r1 | Yan et al. | Memory-R1: reinforcement-learned memory manager (ADD/UPDATE/DELETE/NOOP) + answer agent with learned memory distillation; data-efficient RL (PPO/GRPO) training with exact-match reward. |
| jonelagadda-mnemosyne | Jonelagadda et al. | Mnemosyne: edge-friendly graph memory with substance/redundancy filters, probabilistic recall with decay/refresh, and a fixed-budget โcore summaryโ for persona-level context. |
| patel-engram | Patel et al. | ENGRAM: lightweight typed memory (episodic/semantic/procedural) with simple dense retrieval + strict evidence budgets; strong LoCoMo + LongMemEval results with low token usage. |
| wei-evo-memory | Wei et al. | Evo-Memory: streaming benchmark + framework for self-evolving memory and experience reuse; introduces ExpRAG and ReMem (Think/Act/Refine) baselines and robustness/efficiency metrics. |
| cao-remember-me-refine-me | Cao et al. | ReMe: dynamic procedural memory lifecycle (acquireโreuseโrefine) with multi-faceted distillation from success/failure trajectories, scenario-aware retrieval, and utility-based pruning; strong BFCLโV3/AppWorld results (as reported). |
| sarin-memoria | Sarin et al. | Memoria: personalization memory layer combining session summaries + KG triplets (persona) with exponential recency weighting; SQLite + ChromaDB architecture and LongMemEvals subset results. |
| latimer-hindsight | Latimer et al. | Hindsight: retain/recall/reflect architecture separating evidence vs beliefs vs summaries; temporal+entity memory graph with multi-channel retrieval fusion and belief confidence updates; very strong LongMemEval/LoCoMo results (as reported). |
| yu-agentic-memory | Yu et al. | AgeMem: RL-trained unified LTM+STM controller exposing memory ops as tool actions (add/update/delete/retrieve/summarize/filter) with a 3-stage curriculum and step-wise GRPO for credit assignment. |
| hu-evermemos | Hu et al. | EverMemOS: self-organizing โmemory OSโ with MemCellsโMemScenes lifecycle, user profile consolidation, and necessity/sufficiency-guided recollection (verifier + query rewrite); strong LoCoMo/LongMemEval results (as reported). |
| li-timem | Li et al. | TiMem: temporal-hierarchical memory consolidation (segmentโsessionโdayโweekโprofile) with query-complexity recall planning + gating; strong LoCoMo/LongMemEval-S accuracy with low recalled tokens (as reported). |
| zhang-himem | Zhang et al. | HiMem: hierarchical long-term memory split (Episode Memory + Note Memory) with topic+surprise episode segmentation, note-first โbest-effortโ retrieval w/ sufficiency checks, and conflict-aware reconsolidation; strong LoCoMo results (as reported). |
| behrouz-nested-learning | Behrouz et al. | Nested Learning / CMS / Hope: reframes memory as multi-timescale update dynamics (continuum memory blocks updated at different frequencies) with implications for consolidation and โcorrections without forgettingโ. |
| zhang-recursive-language-models | Zhang et al. | Recursive Language Models (RLMs): inference-time recursion + REPL state treats long prompts as an external environment; processes multiโmillion-token inputs with sub-calls and programmatic slicing, often beating long-context scaffolds at comparable average cost (as reported). |
| wang-m-plus | Wang et al. | M+: latent-space long-term memory extension to MemoryLLM that stores dropped memory tokens in an LTM pool and retrieves them during generation with a co-trained retriever; extends retention to >160k tokens at similar GPU memory cost (as reported). |
| dong-minja | Dong et al. | MINJA: practical memory injection attack on โmemory-as-demonstrationsโ agents via query-only interaction (bridging steps + progressive shortening); motivates write-time gates, isolation, and safer memory representations. |
| sunil-memory-poisoning-attack-defense | Sunil et al. | Memory poisoning attack & defense: empirical MINJA follow-up in EHR agents; shows pre-existing benign memory can reduce ASR, and that trust-score defenses can fail via over-conservatism or overconfidence. |
| anokhin-arigraph | Anokhin et al. | AriGraph: knowledge-graph world model that links episodic observation nodes to extracted semantic triplets; two-stage retrieval (semanticโepisodic) for planning/exploration in text-game environments. |
| behrouz-titans | Behrouz et al. | Titans: long-context architecture with an online-updated neural memory module (test-time learning) plus persistent task memory; provides explicit primitives for surprise-based salience and forgetting. |
| ahn-hema | Ahn | HEMA: hippocampus-inspired dual memory for long conversations (running compact summary + FAISS episodic vector store) with explicit prompt budgeting, pruning (โsemantic forgettingโ), and summary-of-summaries consolidation. |
| tan-membench | Tan et al. | MemBench: benchmark/dataset for agent memory covering participation vs observation scenarios and factual vs reflective memory, with metrics for accuracy/recall/capacity and read/write-time efficiency. |
Root-level critical analyses intended for synthesis work. These reference the summaries above, but focus on coherence, evidence quality, risks, and synthesis-ready claim framing.
| Synthesis | Based on | Focus |
|---|---|---|
| ANALYSIS | ANALYSIS-*.md + shisad docs + Mem0/Letta baselines |
Cross-system comparison (techniques + memory types), plus mapping to shisad and โtraditionalโ RAG-ish memory |
| ANALYSIS-academic-industry | paper ANALYSIS-arxiv-*.md + shisad plan |
Academic/industry synthesis: benchmarks vs systems vs attacks, with โwhatโs missing in shisadโ framing |
| Benchmarks best practices | Public disputes, audits, our analysis | Known pitfalls, metric confusion, dataset quality issues, per-benchmark limitations |
| MELT benchmark design | ANALYSIS.md systems + Reality Check epistemic docs | Memory Evaluation for Lifecycle Testing โ session-replay benchmark testing full memory lifecycle (decay, consolidation, contradiction, core stability, inference) at 6 scale tiers over simulated time. Separate repo; draft. |
| Analysis | Based on | Focus |
|---|---|---|
| ANALYSIS-jumperz-agent-memory-stack | references/jumperz-agent-memory-stack.md |
Checklist critique (semantics, failure modes, missing evaluation), synthesis-ready takeaways + claims table |
| ANALYSIS-joelhooks-adr-0077-memory-system-next-phase | references/joelhooks-adr-0077-memory-system-next-phase.md |
Increment plan critique (decay, rewrite, dedup, echo/fizzle), validation plan + claims |
| ANALYSIS-coolmanns-openclaw-memory-architecture | references/coolmanns-openclaw-memory-architecture.md + vendor/openclaw-memory-architecture/ |
Layered stack critique with benchmark-method verification, operational risks, doc drift notes |
| ANALYSIS-drag88-agent-output-degradation | references/drag88-agent-output-degradation.md |
Convergence + enforcement pattern critique (judgeโrule distillation), measurement gaps, risks |
| ANALYSIS-versatly-clawvault | references/versatly-clawvault.md + vendor/clawvault/ |
Product/tooling critique (surface area, hooks, qmd dependency), security posture, missing benchmarks |
| ANALYSIS-vstorm-memv | references/vstorm-memv.md + vendor/memv/ |
Implementation critique of Nemori-inspired predict-calibrate extraction + bi-temporal validity + hybrid retrieval, with gaps/risks and shisad mapping |
| ANALYSIS-openviking | vendor/openviking/ + Hermes provider docs |
Open-source context database: viking:// filesystem, L0/L1/L2 tiered loading, session-commit extraction across 8 memory categories, and hierarchical typed retrieval over memory/resources/skills; strong observability with heavier operational complexity |
| ANALYSIS-byterover-cli | vendor/byterover-cli/ + vendor/byterover-cli/paper/ |
Agent-native coding-agent memory/runtime: daemon + per-project agent pool, markdown context tree with explicit relations and lifecycle, 5-tier progressive retrieval with cache/OOD detection, and strong self-reported benchmarks with caveats |
| ANALYSIS-mira-OSS | vendor/mira-OSS/ |
Full-stack event-driven agent (v1 rev 2): activity-day sigmoid decay, hub discovery + 3-axis linking (vector+entity+TF-IDF), Text-Based LoRA + user model synthesis with critic validation, background forage agent (sub-agent collaboration), portrait synthesis, 16 tools, context overflow remediation, immutable domain models, multi-user RLS + Vault; gaps in write gating, external benchmarks, taint tracking, and sub-agent capability scoping |
| ANALYSIS-claude-code-memory | Source: /home/lhl/Downloads/claude-code/src |
Claude Code memory subsystem (Anthropic): first-party production-scale memory system; flat-file MEMORY.md + typed topic files (user/feedback/project/reference) + background extraction via forked agent with mutual exclusion + LLM-based relevance selection (Sonnet) + team memory with OAuth sync + auto dream consolidation + KAIROS daily-log mode + eval-validated prompts with case IDs + security-hardened path validation; no vector search, no graph, no decay scoring |
| ANALYSIS-codex-memory | openai/codex | Codex memory subsystem (OpenAI): first-party open-source coding agent; two-phase async pipeline (gpt-5.1-codex-mini extraction โ gpt-5.3-codex consolidation) + SQLite-backed job coordination (leases/heartbeats/watermarks) + progressive disclosure layout (memory_summary โ MEMORY.md โ rollout_summaries โ skills) + skills as procedural memory + usage-based citation-driven retention + thread-diff incremental forgetting + ~1,400 lines extraction/consolidation prompts; no vector search, no team memory, no real-time extraction |
| ANALYSIS-google-always-on-memory-agent | vendor/always-on-memory-agent/ |
Official Google ADK sample: always-on daemon with multimodal ingestion (27 file types via Gemini 3.1 Flash-Lite), periodic LLM consolidation, SQLite storage, HTTP API + Streamlit dashboard; no retrieval/search (recency scan LIMIT 50), no decay/dedup/versioning; useful as ADK orchestration reference and multimodal ingestion pattern |
| ANALYSIS-supermemory | references/supermemory.md + vendor/supermemory/ |
Memory-as-a-service startup: memory versioning (linked-list chains via parentMemoryId/rootMemoryId/isLatest), typed relationship ontology (updates/extends/derives), static/dynamic profile synthesis API, time-based forgetting with audit trail, multi-model embedding columns, MemoryBench framework; open-source repo is SDK/frontend only โ core engine logic is proprietary hosted backend |
| ANALYSIS-karta | vendor/karta/ |
Karta (rohithzr): Rust (~10.4K LOC) agentic memory library with Zettelkasten-inspired knowledge graph, 7-type dream engine (deduction/induction/abduction/consolidation/contradiction/episode digest/cross-episode digest) with inference feedback into retrieval, embedding-based query classification (6 modes), retroactive context evolution with drift protection, cross-encoder reranking with abstention, multi-hop BFS traversal, atomic fact decomposition with per-fact embeddings, foresight signals with TTL, structured episode digests; BEAM 100K: 57.7% with 243-failure root cause catalog |
| Analysis | Based on | Focus |
|---|---|---|
| ANALYSIS-arxiv-2602.01313-evermembench | references/hu-evermembench.md + references/papers/arxiv-2602.01313.pdf |
Benchmark critique emphasizing version semantics, multi-party fragmentation, oracle diagnostics, and shisad mapping |
| ANALYSIS-arxiv-2602.02369-live-evo | references/zhang-live-evo.md + references/papers/arxiv-2602.02369.pdf |
System deep dive emphasizing online experience weighting from continuous feedback, meta-guidelines for memory compilation, and memory-on vs memory-off utility measurement; shisad mapping for feedback loops + procedural memory gating |
| ANALYSIS-arxiv-2602.11243-structmemeval | references/shutova-structmemeval.md + references/papers/arxiv-2602.11243.pdf |
Benchmark deep dive emphasizing memory organization/structure as a distinct capability (trees/ledgers/state), hint vs no-hint diagnostics, and implications for shisad structured-memory primitives |
| ANALYSIS-arxiv-2602.05665-graph-based-agent-memory-taxonomy | references/yang-graph-based-agent-memory-taxonomy.md + references/papers/arxiv-2602.05665.pdf |
Survey deep dive providing graph-based memory taxonomy and lifecycle (extract/store/retrieve/evolve), with implications for shisad graph-as-derived-view, operator choices, and maintenance jobs |
| ANALYSIS-arxiv-2404.13501-survey-memory-mechanism | references/zhang-survey-memory-mechanism.md + references/papers/arxiv-2404.13501.pdf |
Survey deep dive providing baseline taxonomy and evaluation checklists for agent memory; useful coverage reference alongside newer benchmarks/systems for shisadโs roadmap |
| ANALYSIS-arxiv-2512.13564-memory-age-ai-agents | references/hu-memory-age-ai-agents.md + references/papers/arxiv-2512.13564.pdf |
Survey deep dive emphasizing the FormsโFunctionsโDynamics taxonomy and frontiers (RL integration, multimodal, multi-agent shared memory, trustworthiness), used as organizing frame for shisad v0.7 memory roadmap |
| ANALYSIS-arxiv-2402.17753-locomo | references/maharana-locomo.md + references/papers/arxiv-2402.17753.pdf |
Dataset/benchmark critique with episodic-memory implications (event graphs, multimodal, RAG harm) and shisad mapping |
| ANALYSIS-arxiv-2410.10813-longmemeval | references/wu-longmemeval.md + references/papers/arxiv-2410.10813.pdf |
Benchmark and system-design decomposition (indexing/retrieval/reading), with mapping to shisad primitives |
| ANALYSIS-arxiv-2310.08560-memgpt | references/packer-memgpt.md + references/papers/arxiv-2310.08560.pdf |
System deep dive emphasizing virtual context management (OS paging), memory tiers (working/queue/recall/archival), function-call memory ops, and implications for shisad versioned corrections + write-policy hardening |
| ANALYSIS-arxiv-2602.10715-locomoplus | references/li-locomoplus.md + references/papers/arxiv-2602.10715.pdf |
Beyond-factual โcognitive memoryโ benchmark critique (latent constraints) and implications for safe constraint/procedural memory |
| ANALYSIS-arxiv-2504.19413-mem0 | references/chhikara-mem0.md + references/papers/arxiv-2504.19413.pdf |
System deep dive emphasizing explicit memory ops, graph-memory tradeoffs, deployment metrics (tokens/p95), and shisad mapping (versioned corrections vs delete) |
| ANALYSIS-arxiv-2601.02553-simplemem | references/liu-simplemem.md + references/papers/arxiv-2601.02553.pdf |
System deep dive emphasizing write-time semantic structured compression, online consolidation, and intent-aware multi-view retrieval planning; mapping to shisad โderived vs rawโ memory + retrieval budgeting |
| ANALYSIS-arxiv-2502.12110-a-mem | references/xu-a-mem.md + references/papers/arxiv-2502.12110.pdf |
System deep dive emphasizing Zettelkasten-style notes + LLM-driven linking + memory evolution, with strong multi-hop/temporal LoCoMo gains but high versioning/audit requirements for shisad |
| ANALYSIS-arxiv-2503.21760-meminsight | references/salama-meminsight.md + references/papers/arxiv-2503.21760.pdf |
System deep dive emphasizing autonomous attribute mining/annotation as a derived metadata layer to improve retrieval recall and downstream tasks; mapping to shisad schema constraints + provenance/versioning |
| ANALYSIS-arxiv-2511.18423-gam | references/yan-gam.md + references/papers/arxiv-2511.18423.pdf |
System deep dive emphasizing just-in-time context compilation via memo index + universal page-store and an iterative deep-research researcher; highlights the latency/quality trade-off and mapping to shisad evidence-first episodic storage |
| ANALYSIS-arxiv-2501.13956-zep | references/rasmussen-zep.md + references/papers/arxiv-2501.13956.pdf |
System deep dive emphasizing bi-temporal validity semantics, episodic+semantic+community graph tiers, hybrid retrieval (BM25/embeddings/BFS), and implications for shisad versioned memory |
| ANALYSIS-arxiv-2507.03724-memos | references/li-memos.md + references/papers/arxiv-2507.03724.pdf |
System deep dive emphasizing MemCube metadata, multi-substrate memory (plaintext/KV/parameter), lifecycle/scheduling/governance, and mapping to shisad primitives |
| ANALYSIS-arxiv-2508.19828-memory-r1 | references/yan-memory-r1.md + references/papers/arxiv-2508.19828.pdf |
RL deep dive emphasizing learned memory ops (ADD/UPDATE/DELETE/NOOP) + post-retrieval memory distillation, reward design, and whatโs required to safely adopt this in shisad |
| ANALYSIS-arxiv-2508.03341-nemori | references/nan-nemori.md + references/papers/arxiv-2508.03341.pdf |
System deep dive emphasizing episode segmentation (Two-Step Alignment) + predict-calibrate semantic distillation, reported LoCoMo/LongMemEvalS gains, and implications for shisad write gating + correction semantics |
| ANALYSIS-arxiv-2510.08601-mnemosyne | references/jonelagadda-mnemosyne.md + references/papers/arxiv-2510.08601.pdf |
System deep dive emphasizing edge-first graph memory, redundancy/refresh, probabilistic decay-based recall, and a fixed-budget core/persona summary; includes evaluation-rigor cautions |
| ANALYSIS-arxiv-2511.12960-engram | references/patel-engram.md + references/papers/arxiv-2511.12960.pdf |
System deep dive emphasizing typed memory (episodic/semantic/procedural), deterministic routing/formatting, strict evidence budgets, and strong token/latency results; mapping to shisad primitives |
| ANALYSIS-arxiv-2511.20857-evo-memory | references/wei-evo-memory.md + references/papers/arxiv-2511.20857.pdf |
Benchmark deep dive emphasizing streaming task-sequence evaluation for experience reuse, plus refine/prune mechanisms and metrics (robustness, step efficiency) for shisadโs eval harness |
| ANALYSIS-arxiv-2512.10696-remember-me-refine-me | references/cao-remember-me-refine-me.md + references/papers/arxiv-2512.10696.pdf |
System deep dive emphasizing procedural memory distillation + scenario-aware reuse + utility-based refinement/pruning; mapping to shisad procedural tier + versioned invalidation vs delete |
| ANALYSIS-arxiv-2512.12686-memoria | references/sarin-memoria.md + references/papers/arxiv-2512.12686.pdf |
System deep dive emphasizing persona KG + session summaries with recency-weighted retrieval; highlights missing governance/versioning primitives needed for shisad |
| ANALYSIS-arxiv-2512.12818-hindsight | references/latimer-hindsight.md + references/papers/arxiv-2512.12818.pdf |
System deep dive emphasizing retain/recall/reflect with four-network memory (facts/experiences/observations/beliefs), token-budgeted multi-channel retrieval fusion, and belief confidence updates; key shisad mapping |
| ANALYSIS-arxiv-2601.01885-agentic-memory | references/yu-agentic-memory.md + references/papers/arxiv-2601.01885.pdf |
RL deep dive emphasizing unified LTM+STM memory ops as tool actions, 3-stage training curriculum, step-wise GRPO credit assignment, and implications for shisadโs future learned memory policies |
| ANALYSIS-arxiv-2601.02163-evermemos | references/hu-evermemos.md + references/papers/arxiv-2601.02163.pdf |
System deep dive emphasizing MemCellโMemScene consolidation lifecycle, user profile/foresight, and sufficiency-verified scene-guided retrieval; mapping to shisad consolidation roadmap |
| ANALYSIS-arxiv-2601.02845-timem | references/li-timem.md + references/papers/arxiv-2601.02845.pdf |
System deep dive emphasizing temporal-hierarchical consolidation (TMT), query-complexity recall planning/gating, and the accuracyโtoken frontier; mapping to shisad temporal tiers |
| ANALYSIS-arxiv-2601.06377-himem | references/zhang-himem.md + references/papers/arxiv-2601.06377.pdf |
System deep dive emphasizing Episode Memory + Note Memory hierarchy, note-first โbest-effortโ retrieval w/ sufficiency checks, and conflict-aware reconsolidation; mapping to shisad eventโknowledge tiers + versioned updates |
| ANALYSIS-arxiv-2512.24695-nested-learning | references/behrouz-nested-learning.md + references/papers/arxiv-2512.24695.pdf |
Conceptual deep dive on multi-timescale โcontinuum memoryโ and consolidation dynamics; mapping to shisad tiered memory + versioned corrections |
| ANALYSIS-arxiv-2512.24601-recursive-language-models | references/zhang-recursive-language-models.md + references/papers/arxiv-2512.24601.pdf |
Architecture deep dive emphasizing RLM-style programmatic reading/compilation over arbitrarily long evidence stores (REPL + recursion + sub-calls), with implications for shisad sandboxed compilation traces and cost tail management |
| ANALYSIS-arxiv-2502.00592-m-plus | references/wang-m-plus.md + references/papers/arxiv-2502.00592.pdf |
Architecture deep dive emphasizing latent-space long-term memory tokens + co-trained retrieval for >160k retention, with mapping to shisadโs external evidence-first memory and retrieval diagnostics |
| ANALYSIS-arxiv-2503.03704-minja | references/dong-minja.md + references/papers/arxiv-2503.03704.pdf |
Security deep dive on query-only memory injection attacks; implications for write-policy, provenance/taint, isolation, and โdonโt store demonstrationsโ patterns |
| ANALYSIS-arxiv-2601.05504-memory-poisoning-attack-defense | references/sunil-memory-poisoning-attack-defense.md + references/papers/arxiv-2601.05504.pdf |
Security deep dive emphasizing ISR vs ASR under realistic memory conditions, and why trust-score sanitization can fail; concrete shisad hardening takeaways |
| ANALYSIS-arxiv-2407.04363-arigraph | references/anokhin-arigraph.md + references/papers/arxiv-2407.04363.pdf |
System deep dive emphasizing episodicโsemantic memory linking, graph-structured retrieval for planning/exploration, and implications for shisad episode objects + provenance + correction semantics |
| ANALYSIS-arxiv-2501.00663-titans | references/behrouz-titans.md + references/papers/arxiv-2501.00663.pdf |
Architecture deep dive emphasizing test-time-learning neural memory (surprise/momentum/forgetting), Titans MAC/MAG/MAL variants, and how to translate salience/decay ideas into shisadโs external memory framework |
| ANALYSIS-arxiv-2504.16754-hema | references/ahn-hema.md + references/papers/arxiv-2504.16754.pdf |
System deep dive emphasizing dual memory (summary + vector store), explicit prompt budgeting, pruning/consolidation policies, and evaluation-rigor cautions for shisad adoption |
| ANALYSIS-arxiv-2506.21605-membench | references/tan-membench.md + references/papers/arxiv-2506.21605.pdf |
Benchmark deep dive emphasizing multi-scenario (participant vs observer) and multi-level (factual vs reflective) evaluation, plus latency/capacity metrics and implications for shisad eval harnesses |
| Source | URL |
|---|---|
| @jumperz memory stack thread | https://x.com/jumperz/status/2024841165774717031 |
| @joelhooks ADR tweet | https://x.com/joelhooks/status/2024947701738262773 |
| joelclaw ADR-0077 | https://joelclaw.com/adrs/0077-memory-system-next-phase |
| @drag88 article | https://x.com/drag88/status/2022551759491862974 |
| supermemory docs | https://supermemory.ai/docs |
| supermemory repo | https://github.com/supermemoryai/supermemory |
| mempalace repo | https://github.com/milla-jovovich/mempalace |
| karta repo | https://github.com/rohithzr/karta |
agentic-memory/
โโโ README.md โ this file
โโโ ANALYSIS.md โ synthesis + comparison
โโโ ANALYSIS-academic-industry.md โ academic/industry synthesis
โโโ ANALYSIS-jumperz-agent-memory-stack.md
โโโ ANALYSIS-joelhooks-adr-0077-memory-system-next-phase.md
โโโ ANALYSIS-coolmanns-openclaw-memory-architecture.md
โโโ ANALYSIS-drag88-agent-output-degradation.md
โโโ ANALYSIS-versatly-clawvault.md
โโโ ANALYSIS-vstorm-memv.md
โโโ ANALYSIS-mira-OSS.md
โโโ ANALYSIS-codex-memory.md
โโโ ANALYSIS-google-always-on-memory-agent.md
โโโ ANALYSIS-supermemory.md
โโโ ANALYSIS-karta.md โ Karta: Rust agentic memory library with dream engine
โโโ ANALYSIS-mempalace.md โ not in ANALYSIS.md (claims-vs-code issues); see REVIEWED.md
โโโ REVIEWED.md โ triage log (examined but not promoted to ANALYSIS)
โโโ PUNCHLIST-academic-industry.md โ tracking checklist for paper deep dives
โโโ templates/ โ templates for paper analyses/summaries
โ
โโโ references/ โ summarized reference docs (markdown w/ frontmatter)
โ โโโ 1-full-agent-memory-build.jpg โ jumperz card 1: memory storage
โ โโโ 2-feeds-into.jpg โ jumperz card 2: memory intelligence
โ โโโ jumperz-agent-memory-stack.md
โ โโโ joelhooks-adr-0077-memory-system-next-phase.md
โ โโโ coolmanns-openclaw-memory-architecture.md
โ โโโ drag88-agent-output-degradation.md
โ โโโ versatly-clawvault.md
โ โโโ hu-evermembench.md
โ โโโ li-locomoplus.md
โ โโโ maharana-locomo.md
โ โโโ wu-longmemeval.md
โ โโโ chhikara-mem0.md
โ โโโ papers/ โ archived PDFs + text snapshots
โ โโโ README.md
โ โโโ arxiv-*.pdf
โ โโโ arxiv-*.md
โ
โโโ vendor/ โ cloned source repos
โโโ mira-OSS/ โ github.com/taylorsatula/mira-OSS (snapshot, AGPLv3)
โ โโโ README.md
โ โโโ CLAUDE.md โ project guide (architecture, patterns, principles)
โ โโโ main.py โ FastAPI entry point
โ โโโ cns/ โ Central Nervous System (conversation orchestration)
โ โ โโโ api/ โ FastAPI endpoints (chat, actions, data, health)
โ โ โโโ core/ โ Domain models (Continuum, Message, Events)
โ โ โโโ services/ โ Orchestrator, subcortical, summary, collapse handler
โ โ โโโ infrastructure/ โ Repositories, Valkey cache, unit of work
โ โโโ lt_memory/ โ Long-term memory system
โ โ โโโ scoring_formula.sql โ Multi-factor activity-day sigmoid importance scoring
โ โ โโโ models.py โ Memory, Entity, ExtractedMemory, link types
โ โ โโโ hybrid_search.py โ BM25 + pgvector with RRF
โ โ โโโ proactive.py โ Dual-path retrieval (similarity + hub discovery)
โ โ โโโ hub_discovery.py โ Entity-driven memory retrieval via pg_trgm
โ โ โโโ processing/ โ Extraction, consolidation, entity GC pipelines
โ โโโ working_memory/ โ System prompt composition via trinkets
โ โโโ tools/ โ Self-registering tool framework (11 built-in)
โ โโโ config/ โ Pydantic config + prompt templates
โ โโโ auth/ โ WebAuthn + magic link authentication
โ
โโโ openclaw-memory-architecture/ โ github.com/coolmanns/openclaw-memory-architecture
โ โโโ README.md
โ โโโ PROJECT.md
โ โโโ CHANGELOG.md
โ โโโ docs/
โ โ โโโ ARCHITECTURE.md โ full 12-layer technical reference
โ โ โโโ knowledge-graph.md โ graph search pipeline, benchmarks
โ โ โโโ context-optimization.md
โ โ โโโ embedding-setup.md
โ โ โโโ benchmark-process.md
โ โ โโโ benchmark-results.md
โ โ โโโ code-search.md
โ โ โโโ COMPARISON.md
โ โโโ schema/
โ โ โโโ facts.sql โ SQLite schema for knowledge graph
โ โโโ scripts/ โ init, seed, search, ingest, decay, benchmark, telemetry
โ โโโ templates/ โ starter files (active-context, gating-policies, etc.)
โ โโโ plugin-graph-memory/ โ OpenClaw plugin (JS)
โ
โโโ karta/ โ github.com/rohithzr/karta (submodule, MIT)
โ โโโ Cargo.toml โ workspace: karta-core + karta-cli
โ โโโ crates/
โ โ โโโ karta-core/ โ Core engine (~6.7K LOC Rust)
โ โ โโโ src/
โ โ โ โโโ note.rs โ MemoryNote, Provenance, NoteStatus, AtomicFact, Episode, EpisodeDigest
โ โ โ โโโ write.rs โ Write path: index, link, evolve, foresight, facts
โ โ โ โโโ read.rs โ Read path: classify, search, traverse, rerank, synthesize
โ โ โ โโโ rerank.rs โ Jina/LLM/noop rerankers
โ โ โ โโโ dream/ โ Dream engine: 7 inference types
โ โ โ โโโ store/ โ LanceDB + SQLite implementations
โ โ โ โโโ llm/ โ Provider trait + OpenAI + mock + prompts
โ โ โโโ tests/ โ eval, beam_100k, bench_beam (~3.8K LOC)
โ โโโ findings.md โ BEAM 100K detailed failure analysis
โ โโโ plan.md โ Experiment plan targeting 90%+
โ
โโโ always-on-memory-agent/ โ GoogleCloudPlatform/generative-ai (official ADK sample)
โ โโโ agent.py โ ADK multi-agent daemon (ingest/consolidate/query)
โ โโโ dashboard.py โ Streamlit UI
โ โโโ docs/ โ Logo/architecture assets
โ
โโโ memv/ โ github.com/vstorm-co/memv
โ โโโ README.md
โ โโโ CHANGELOG.md
โ โโโ pyproject.toml โ PyPI: memvee, v0.1.0
โ โโโ docs/ โ docs site (MkDocs)
โ โโโ src/
โ โ โโโ memv/ โ segmentation, extraction, validity, retrieval, storage
โ โโโ tests/
โ
โโโ supermemory/ โ github.com/supermemoryai/supermemory (lean subset: schemas, SDK, MCP, arch docs)
โ โโโ LICENSE
โ โโโ README.md โ provenance + open-source vs hosted-backend split
โ โโโ packages/
โ โ โโโ validation/ โ Zod schemas (data model definitions)
โ โ โ โโโ schemas.ts
โ โ โ โโโ api.ts
โ โ โโโ lib/
โ โ โ โโโ api.ts โ reveals backend dependency (api.supermemory.ai)
โ โ โ โโโ similarity.ts โ client-side cosine sim (visualization only)
โ โ โโโ tools/src/shared/
โ โ โโโ memory-client.ts โ SDK client (profile search, prompt formatting)
โ โโโ apps/mcp/src/
โ โ โโโ server.ts โ MCP server (memory/recall/whoAmI tools)
โ โโโ skills/supermemory/references/
โ โโโ architecture.md โ claimed design (558 lines)
โ
โโโ clawvault/ โ github.com/Versatly/clawvault
โโโ README.md
โโโ PLAN.md โ issue #4: ledger, reflect, replay, archive
โโโ CHANGELOG.md
โโโ SKILL.md
โโโ package.json โ npm: clawvault, v2.6.1
โโโ src/
โ โโโ commands/ โ archive, context, inject, observe, reflect, replay, wake, sleep, task, project, ...
โ โโโ observer/ โ compressor, reflector, router, session-watcher
โ โโโ lib/ โ vault, memory-graph, ledger, observation-format, session-utils
โ โโโ cli/
โโโ bin/ โ CLI entry + command registration modules
โโโ hooks/ โ OpenClaw hook handler
โโโ dashboard/ โ web dashboard (vault parser, graph diff)
โโโ schemas/
โโโ scripts/
โโโ templates/
โโโ tests/
- Phased build order matters: Core memory first (write/read/decay), reliability second (dedup/maintenance/recovery), intelligence last (graphs/trust/cross-agent). Building out of order amplifies flaws.
- Tiered retrieval: Summary files first (fast, cheap), vector search fallback (thorough, expensive). Don't vector-search everything.
- Score decay:
final_score = relevance ร exp(-ฮป ร days)โ recency-weighted relevance is universal across all architectures. - Feedback loops: Echo/fizzle (track which injected memories get used), behavior loops (extract corrections as lessons), learning loops (convert expensive LLM checks into cheap static rules).
- SQLite over hosted vector DBs: At current scales (1K-5K entries), SQLite + FTS5 + local embeddings outperforms hosted solutions on latency, cost, and operational simplicity.
- Multi-agent convergence: Shared memory creates homogenization pressure. Workspace isolation + file routing guards help but don't fully solve it.
- Vault index pattern: Single scannable manifest (one-line descriptions) โ load individual entries on demand. One file read instead of N.
