freshcrate
Home > RAG & Memory > agentic-memory

agentic-memory

No description

Description

README

Agentic Memory Research

Research collection on agent memory architectures, persistence patterns, and output quality maintenance for LLM-based agent systems.

Citation

If you reference this repo’s summaries/analyses in academic or professional work, please cite:

@misc{lin_agentic_memory_2026,
  author       = {Leonard Lin},
  title        = {agentic-memory: Agentic Memory Research Collection (Summaries and Analyses)},
  year         = {2026},
  howpublished = {GitHub repository},
  url          = {https://github.com/lhl/agentic-memory},
}

Reference Summaries

Document Author Description
jumperz-agent-memory-stack @jumperz 31-piece memory architecture split across 3 phases (Core β†’ Reliability β†’ Intelligence). Complete prompt/spec breakdowns for write pipeline, read pipeline, decay, knowledge graph, episodic memory, trust scoring, echo/fizzle feedback loops. The foundational reference that others build on.
joelhooks-adr-0077-memory-system-next-phase @joelhooks ADR for joelclaw (personal AI Mac Mini). Maps existing production system (~6 days running, Qdrant 1,343 points) against jumperz's 31 pieces. Plans 3 increments: retrieval quality (score decay, query rewriting), storage quality (dedup, nightly maintenance), feedback loop (echo/fizzle). Includes detailed gap analysis.
coolmanns-openclaw-memory-architecture coolmanns 12-layer production memory stack for OpenClaw with 14 agents. SQLite+FTS5 knowledge graph (3,108 facts), llama.cpp GPU embeddings (768d, 7ms), three runtime plugins (continuity, stability, graph-memory). 100% recall on 60-query benchmark. Includes activation/decay system, domain RAG, session boot sequences.
drag88-agent-output-degradation @drag88 (Aswin) "Why Your Agent's Output Gets Worse Over Time" β€” multi-agent convergence problem. 4-tier memory (working β†’ episodic β†’ semantic β†’ procedural). 3-layer enforcement pipeline (YAML regex β†’ Gemini LLM judge β†’ self-learning loop). Core insight: convert expensive runtime LLM checks into free static regex rules over time.
versatly-clawvault Versatly (@drag88) ClawVault npm CLI tool β€” structured markdown memory vault with observation pipeline, knowledge graph, session lifecycle (wake/sleep/checkpoint), task/project primitives, Obsidian integration, OpenClaw hooks. 449+ tests. v2.6.1.
vstorm-memv vstorm-co memv (PyPI: memvee) β€” Nemori-inspired predict-calibrate extraction + episode segmentation, plus Graphiti-style bi-temporal validity and hybrid retrieval (sqlite-vec + FTS5 + RRF) on SQLite.
supermemory Dhravya Shah / supermemoryai Supermemory memory-as-a-service API: memory versioning (linked-list chains), typed relationships (updates/extends/derives), static/dynamic profile synthesis, time-based forgetting with reason tracking, multi-model embedding storage. Critical caveat: open-source repo is frontend/SDK only; core engine is proprietary backend at api.supermemory.ai.

Paper Reference Summaries (Academic / Industry)

Document Author Description
hu-evermembench Hu et al. EverMemBench benchmark for >1M-token multi-party, multi-group interleaved conversations; diagnoses multi-hop collapse, temporal/versioning difficulty, and retrieval-bottlenecked β€œmemory awareness”.
zhang-live-evo Zhang et al. Live-Evo: online self-evolving agent memory with an experience bank + meta-guideline bank, contrastive β€œmemory-on vs memory-off” feedback, and weight-based reinforcement/forgetting; evaluated on Prophet Arena + deep research (as reported).
shutova-structmemeval Shutova et al. StructMemEval benchmark for whether agents can organize memory into useful structures (trees/ledgers/state tracking), not just retrieve facts; includes hint vs no-hint evaluation to isolate β€œstructure recognition” failures.
yan-gam Yan et al. GAM: just-in-time agent memory via lightweight memos + a universal page-store, plus a deep-research researcher that plans/searches/integrates/reflects over history to compile optimized context at runtime; strong long-context QA gains with higher latency (as reported).
yang-graph-based-agent-memory-taxonomy Yang et al. Graph-based Agent Memory survey: graph-centric taxonomy + lifecycle (extract/store/retrieve/evolve), storage structures (KG/temporal/hyper/hierarchical/hybrid), retrieval operators, evolution/maintenance, and resources/benchmarks; useful shared vocabulary for shisad.
zhang-survey-memory-mechanism Zhang et al. Survey on memory mechanisms for LLM agents: definitions, why memory, design axes (sources/forms/ops), evaluation approaches, and application domains; good baseline checklist alongside newer benchmarks/systems.
hu-memory-age-ai-agents Hu et al. Memory in the Age of AI Agents survey: proposes unified lenses of forms (token/parametric/latent), functions (factual/experiential/working), and dynamics (formation/evolution/retrieval), plus benchmarks/frameworks and trustworthiness frontiers.
li-locomoplus Li et al. LoCoMo-Plus: evaluates beyond-factual β€œcognitive memory” (latent constraints like state/goals/values) under cue–trigger semantic disconnect, using constraint-consistency + LLM-judge evaluation.
maharana-locomo Maharana et al. LoCoMo dataset + benchmark for very long-term multi-session conversations (300 turns, multimodal) grounded in personas + temporal event graphs; evaluates QA + event summarization + multimodal generation.
wu-longmemeval Wu et al. LongMemEval benchmark + design decomposition (indexing β†’ retrieval β†’ reading) and system optimizations (value granularity, key expansion, time-aware query expansion).
packer-memgpt Packer et al. MemGPT: OS-inspired hierarchical memory + paging between a fixed-context LLM prompt and external stores (recall + archival), with function-call memory ops and event-driven control flow; foundational baseline for external agent memory.
chhikara-mem0 Chhikara et al. Mem0: production-oriented long-term memory pipeline with explicit ops (ADD/UPDATE/DELETE/NOOP) and an optional graph memory variant; reports quality + token/latency tradeoffs on LoCoMo.
liu-simplemem Liu et al. SimpleMem: write-time semantic structured compression + online synthesis + intent-aware retrieval planning (multi-view dense/BM25/symbolic retrieval with union+dedup) to improve LoCoMo/LongMemEval quality while cutting token cost (as reported).
xu-a-mem Xu et al. A‑Mem: Zettelkasten-inspired note network with LLM-driven link generation and β€œmemory evolution” (updating older note attributes as new evidence arrives); strong LoCoMo multi-hop/temporal gains with far lower token lengths than full-context (as reported).
salama-meminsight Salama et al. MemInsight: autonomous memory augmentation that mines/annotates attributes (entity-centric + conversation-centric; turn/session granularity) and uses attribute-guided retrieval; large LoCoMo retrieval recall gains vs DPR RAG baseline (as reported).
rasmussen-zep Rasmussen et al. Zep: production memory layer built on Graphiti, a bi-temporal knowledge graph (episodes β†’ entities/facts β†’ communities) with validity intervals and invalidation-based corrections; evaluated on DMR + LongMemEval.
nan-nemori Nan et al. Nemori: cognitively-inspired self-organizing agent memory with semantic episode boundary detection + episodic narratives and a predict-calibrate loop that distills semantic knowledge from prediction gaps; strong LoCoMo + LongMemEvalS results (as reported).
li-memos Li et al. MemOS: OS-like memory control plane with MemCube (payload+metadata), lifecycle/scheduling, governance (ACL/TTL/audit), and multi-substrate memory (plaintext/activation/KV/parameter/LoRA).
yan-memory-r1 Yan et al. Memory-R1: reinforcement-learned memory manager (ADD/UPDATE/DELETE/NOOP) + answer agent with learned memory distillation; data-efficient RL (PPO/GRPO) training with exact-match reward.
jonelagadda-mnemosyne Jonelagadda et al. Mnemosyne: edge-friendly graph memory with substance/redundancy filters, probabilistic recall with decay/refresh, and a fixed-budget β€œcore summary” for persona-level context.
patel-engram Patel et al. ENGRAM: lightweight typed memory (episodic/semantic/procedural) with simple dense retrieval + strict evidence budgets; strong LoCoMo + LongMemEval results with low token usage.
wei-evo-memory Wei et al. Evo-Memory: streaming benchmark + framework for self-evolving memory and experience reuse; introduces ExpRAG and ReMem (Think/Act/Refine) baselines and robustness/efficiency metrics.
cao-remember-me-refine-me Cao et al. ReMe: dynamic procedural memory lifecycle (acquireβ†’reuseβ†’refine) with multi-faceted distillation from success/failure trajectories, scenario-aware retrieval, and utility-based pruning; strong BFCL‑V3/AppWorld results (as reported).
sarin-memoria Sarin et al. Memoria: personalization memory layer combining session summaries + KG triplets (persona) with exponential recency weighting; SQLite + ChromaDB architecture and LongMemEvals subset results.
latimer-hindsight Latimer et al. Hindsight: retain/recall/reflect architecture separating evidence vs beliefs vs summaries; temporal+entity memory graph with multi-channel retrieval fusion and belief confidence updates; very strong LongMemEval/LoCoMo results (as reported).
yu-agentic-memory Yu et al. AgeMem: RL-trained unified LTM+STM controller exposing memory ops as tool actions (add/update/delete/retrieve/summarize/filter) with a 3-stage curriculum and step-wise GRPO for credit assignment.
hu-evermemos Hu et al. EverMemOS: self-organizing β€œmemory OS” with MemCellsβ†’MemScenes lifecycle, user profile consolidation, and necessity/sufficiency-guided recollection (verifier + query rewrite); strong LoCoMo/LongMemEval results (as reported).
li-timem Li et al. TiMem: temporal-hierarchical memory consolidation (segment→session→day→week→profile) with query-complexity recall planning + gating; strong LoCoMo/LongMemEval-S accuracy with low recalled tokens (as reported).
zhang-himem Zhang et al. HiMem: hierarchical long-term memory split (Episode Memory + Note Memory) with topic+surprise episode segmentation, note-first β€œbest-effort” retrieval w/ sufficiency checks, and conflict-aware reconsolidation; strong LoCoMo results (as reported).
behrouz-nested-learning Behrouz et al. Nested Learning / CMS / Hope: reframes memory as multi-timescale update dynamics (continuum memory blocks updated at different frequencies) with implications for consolidation and β€œcorrections without forgetting”.
zhang-recursive-language-models Zhang et al. Recursive Language Models (RLMs): inference-time recursion + REPL state treats long prompts as an external environment; processes multi‑million-token inputs with sub-calls and programmatic slicing, often beating long-context scaffolds at comparable average cost (as reported).
wang-m-plus Wang et al. M+: latent-space long-term memory extension to MemoryLLM that stores dropped memory tokens in an LTM pool and retrieves them during generation with a co-trained retriever; extends retention to >160k tokens at similar GPU memory cost (as reported).
dong-minja Dong et al. MINJA: practical memory injection attack on β€œmemory-as-demonstrations” agents via query-only interaction (bridging steps + progressive shortening); motivates write-time gates, isolation, and safer memory representations.
sunil-memory-poisoning-attack-defense Sunil et al. Memory poisoning attack & defense: empirical MINJA follow-up in EHR agents; shows pre-existing benign memory can reduce ASR, and that trust-score defenses can fail via over-conservatism or overconfidence.
anokhin-arigraph Anokhin et al. AriGraph: knowledge-graph world model that links episodic observation nodes to extracted semantic triplets; two-stage retrieval (semantic→episodic) for planning/exploration in text-game environments.
behrouz-titans Behrouz et al. Titans: long-context architecture with an online-updated neural memory module (test-time learning) plus persistent task memory; provides explicit primitives for surprise-based salience and forgetting.
ahn-hema Ahn HEMA: hippocampus-inspired dual memory for long conversations (running compact summary + FAISS episodic vector store) with explicit prompt budgeting, pruning (β€œsemantic forgetting”), and summary-of-summaries consolidation.
tan-membench Tan et al. MemBench: benchmark/dataset for agent memory covering participation vs observation scenarios and factual vs reflective memory, with metrics for accuracy/recall/capacity and read/write-time efficiency.

Deep Dive Analyses

Root-level critical analyses intended for synthesis work. These reference the summaries above, but focus on coherence, evidence quality, risks, and synthesis-ready claim framing.

Synthesis Based on Focus
ANALYSIS ANALYSIS-*.md + shisad docs + Mem0/Letta baselines Cross-system comparison (techniques + memory types), plus mapping to shisad and β€œtraditional” RAG-ish memory
ANALYSIS-academic-industry paper ANALYSIS-arxiv-*.md + shisad plan Academic/industry synthesis: benchmarks vs systems vs attacks, with β€œwhat’s missing in shisad” framing
Benchmarks best practices Public disputes, audits, our analysis Known pitfalls, metric confusion, dataset quality issues, per-benchmark limitations
MELT benchmark design ANALYSIS.md systems + Reality Check epistemic docs Memory Evaluation for Lifecycle Testing β€” session-replay benchmark testing full memory lifecycle (decay, consolidation, contradiction, core stability, inference) at 6 scale tiers over simulated time. Separate repo; draft.
Analysis Based on Focus
ANALYSIS-jumperz-agent-memory-stack references/jumperz-agent-memory-stack.md Checklist critique (semantics, failure modes, missing evaluation), synthesis-ready takeaways + claims table
ANALYSIS-joelhooks-adr-0077-memory-system-next-phase references/joelhooks-adr-0077-memory-system-next-phase.md Increment plan critique (decay, rewrite, dedup, echo/fizzle), validation plan + claims
ANALYSIS-coolmanns-openclaw-memory-architecture references/coolmanns-openclaw-memory-architecture.md + vendor/openclaw-memory-architecture/ Layered stack critique with benchmark-method verification, operational risks, doc drift notes
ANALYSIS-drag88-agent-output-degradation references/drag88-agent-output-degradation.md Convergence + enforcement pattern critique (judge→rule distillation), measurement gaps, risks
ANALYSIS-versatly-clawvault references/versatly-clawvault.md + vendor/clawvault/ Product/tooling critique (surface area, hooks, qmd dependency), security posture, missing benchmarks
ANALYSIS-vstorm-memv references/vstorm-memv.md + vendor/memv/ Implementation critique of Nemori-inspired predict-calibrate extraction + bi-temporal validity + hybrid retrieval, with gaps/risks and shisad mapping
ANALYSIS-openviking vendor/openviking/ + Hermes provider docs Open-source context database: viking:// filesystem, L0/L1/L2 tiered loading, session-commit extraction across 8 memory categories, and hierarchical typed retrieval over memory/resources/skills; strong observability with heavier operational complexity
ANALYSIS-byterover-cli vendor/byterover-cli/ + vendor/byterover-cli/paper/ Agent-native coding-agent memory/runtime: daemon + per-project agent pool, markdown context tree with explicit relations and lifecycle, 5-tier progressive retrieval with cache/OOD detection, and strong self-reported benchmarks with caveats
ANALYSIS-mira-OSS vendor/mira-OSS/ Full-stack event-driven agent (v1 rev 2): activity-day sigmoid decay, hub discovery + 3-axis linking (vector+entity+TF-IDF), Text-Based LoRA + user model synthesis with critic validation, background forage agent (sub-agent collaboration), portrait synthesis, 16 tools, context overflow remediation, immutable domain models, multi-user RLS + Vault; gaps in write gating, external benchmarks, taint tracking, and sub-agent capability scoping
ANALYSIS-claude-code-memory Source: /home/lhl/Downloads/claude-code/src Claude Code memory subsystem (Anthropic): first-party production-scale memory system; flat-file MEMORY.md + typed topic files (user/feedback/project/reference) + background extraction via forked agent with mutual exclusion + LLM-based relevance selection (Sonnet) + team memory with OAuth sync + auto dream consolidation + KAIROS daily-log mode + eval-validated prompts with case IDs + security-hardened path validation; no vector search, no graph, no decay scoring
ANALYSIS-codex-memory openai/codex Codex memory subsystem (OpenAI): first-party open-source coding agent; two-phase async pipeline (gpt-5.1-codex-mini extraction β†’ gpt-5.3-codex consolidation) + SQLite-backed job coordination (leases/heartbeats/watermarks) + progressive disclosure layout (memory_summary β†’ MEMORY.md β†’ rollout_summaries β†’ skills) + skills as procedural memory + usage-based citation-driven retention + thread-diff incremental forgetting + ~1,400 lines extraction/consolidation prompts; no vector search, no team memory, no real-time extraction
ANALYSIS-google-always-on-memory-agent vendor/always-on-memory-agent/ Official Google ADK sample: always-on daemon with multimodal ingestion (27 file types via Gemini 3.1 Flash-Lite), periodic LLM consolidation, SQLite storage, HTTP API + Streamlit dashboard; no retrieval/search (recency scan LIMIT 50), no decay/dedup/versioning; useful as ADK orchestration reference and multimodal ingestion pattern
ANALYSIS-supermemory references/supermemory.md + vendor/supermemory/ Memory-as-a-service startup: memory versioning (linked-list chains via parentMemoryId/rootMemoryId/isLatest), typed relationship ontology (updates/extends/derives), static/dynamic profile synthesis API, time-based forgetting with audit trail, multi-model embedding columns, MemoryBench framework; open-source repo is SDK/frontend only β€” core engine logic is proprietary hosted backend
ANALYSIS-karta vendor/karta/ Karta (rohithzr): Rust (~10.4K LOC) agentic memory library with Zettelkasten-inspired knowledge graph, 7-type dream engine (deduction/induction/abduction/consolidation/contradiction/episode digest/cross-episode digest) with inference feedback into retrieval, embedding-based query classification (6 modes), retroactive context evolution with drift protection, cross-encoder reranking with abstention, multi-hop BFS traversal, atomic fact decomposition with per-fact embeddings, foresight signals with TTL, structured episode digests; BEAM 100K: 57.7% with 243-failure root cause catalog

Paper Deep Dive Analyses (Academic / Industry)

Analysis Based on Focus
ANALYSIS-arxiv-2602.01313-evermembench references/hu-evermembench.md + references/papers/arxiv-2602.01313.pdf Benchmark critique emphasizing version semantics, multi-party fragmentation, oracle diagnostics, and shisad mapping
ANALYSIS-arxiv-2602.02369-live-evo references/zhang-live-evo.md + references/papers/arxiv-2602.02369.pdf System deep dive emphasizing online experience weighting from continuous feedback, meta-guidelines for memory compilation, and memory-on vs memory-off utility measurement; shisad mapping for feedback loops + procedural memory gating
ANALYSIS-arxiv-2602.11243-structmemeval references/shutova-structmemeval.md + references/papers/arxiv-2602.11243.pdf Benchmark deep dive emphasizing memory organization/structure as a distinct capability (trees/ledgers/state), hint vs no-hint diagnostics, and implications for shisad structured-memory primitives
ANALYSIS-arxiv-2602.05665-graph-based-agent-memory-taxonomy references/yang-graph-based-agent-memory-taxonomy.md + references/papers/arxiv-2602.05665.pdf Survey deep dive providing graph-based memory taxonomy and lifecycle (extract/store/retrieve/evolve), with implications for shisad graph-as-derived-view, operator choices, and maintenance jobs
ANALYSIS-arxiv-2404.13501-survey-memory-mechanism references/zhang-survey-memory-mechanism.md + references/papers/arxiv-2404.13501.pdf Survey deep dive providing baseline taxonomy and evaluation checklists for agent memory; useful coverage reference alongside newer benchmarks/systems for shisad’s roadmap
ANALYSIS-arxiv-2512.13564-memory-age-ai-agents references/hu-memory-age-ai-agents.md + references/papers/arxiv-2512.13564.pdf Survey deep dive emphasizing the Forms–Functions–Dynamics taxonomy and frontiers (RL integration, multimodal, multi-agent shared memory, trustworthiness), used as organizing frame for shisad v0.7 memory roadmap
ANALYSIS-arxiv-2402.17753-locomo references/maharana-locomo.md + references/papers/arxiv-2402.17753.pdf Dataset/benchmark critique with episodic-memory implications (event graphs, multimodal, RAG harm) and shisad mapping
ANALYSIS-arxiv-2410.10813-longmemeval references/wu-longmemeval.md + references/papers/arxiv-2410.10813.pdf Benchmark and system-design decomposition (indexing/retrieval/reading), with mapping to shisad primitives
ANALYSIS-arxiv-2310.08560-memgpt references/packer-memgpt.md + references/papers/arxiv-2310.08560.pdf System deep dive emphasizing virtual context management (OS paging), memory tiers (working/queue/recall/archival), function-call memory ops, and implications for shisad versioned corrections + write-policy hardening
ANALYSIS-arxiv-2602.10715-locomoplus references/li-locomoplus.md + references/papers/arxiv-2602.10715.pdf Beyond-factual β€œcognitive memory” benchmark critique (latent constraints) and implications for safe constraint/procedural memory
ANALYSIS-arxiv-2504.19413-mem0 references/chhikara-mem0.md + references/papers/arxiv-2504.19413.pdf System deep dive emphasizing explicit memory ops, graph-memory tradeoffs, deployment metrics (tokens/p95), and shisad mapping (versioned corrections vs delete)
ANALYSIS-arxiv-2601.02553-simplemem references/liu-simplemem.md + references/papers/arxiv-2601.02553.pdf System deep dive emphasizing write-time semantic structured compression, online consolidation, and intent-aware multi-view retrieval planning; mapping to shisad β€œderived vs raw” memory + retrieval budgeting
ANALYSIS-arxiv-2502.12110-a-mem references/xu-a-mem.md + references/papers/arxiv-2502.12110.pdf System deep dive emphasizing Zettelkasten-style notes + LLM-driven linking + memory evolution, with strong multi-hop/temporal LoCoMo gains but high versioning/audit requirements for shisad
ANALYSIS-arxiv-2503.21760-meminsight references/salama-meminsight.md + references/papers/arxiv-2503.21760.pdf System deep dive emphasizing autonomous attribute mining/annotation as a derived metadata layer to improve retrieval recall and downstream tasks; mapping to shisad schema constraints + provenance/versioning
ANALYSIS-arxiv-2511.18423-gam references/yan-gam.md + references/papers/arxiv-2511.18423.pdf System deep dive emphasizing just-in-time context compilation via memo index + universal page-store and an iterative deep-research researcher; highlights the latency/quality trade-off and mapping to shisad evidence-first episodic storage
ANALYSIS-arxiv-2501.13956-zep references/rasmussen-zep.md + references/papers/arxiv-2501.13956.pdf System deep dive emphasizing bi-temporal validity semantics, episodic+semantic+community graph tiers, hybrid retrieval (BM25/embeddings/BFS), and implications for shisad versioned memory
ANALYSIS-arxiv-2507.03724-memos references/li-memos.md + references/papers/arxiv-2507.03724.pdf System deep dive emphasizing MemCube metadata, multi-substrate memory (plaintext/KV/parameter), lifecycle/scheduling/governance, and mapping to shisad primitives
ANALYSIS-arxiv-2508.19828-memory-r1 references/yan-memory-r1.md + references/papers/arxiv-2508.19828.pdf RL deep dive emphasizing learned memory ops (ADD/UPDATE/DELETE/NOOP) + post-retrieval memory distillation, reward design, and what’s required to safely adopt this in shisad
ANALYSIS-arxiv-2508.03341-nemori references/nan-nemori.md + references/papers/arxiv-2508.03341.pdf System deep dive emphasizing episode segmentation (Two-Step Alignment) + predict-calibrate semantic distillation, reported LoCoMo/LongMemEvalS gains, and implications for shisad write gating + correction semantics
ANALYSIS-arxiv-2510.08601-mnemosyne references/jonelagadda-mnemosyne.md + references/papers/arxiv-2510.08601.pdf System deep dive emphasizing edge-first graph memory, redundancy/refresh, probabilistic decay-based recall, and a fixed-budget core/persona summary; includes evaluation-rigor cautions
ANALYSIS-arxiv-2511.12960-engram references/patel-engram.md + references/papers/arxiv-2511.12960.pdf System deep dive emphasizing typed memory (episodic/semantic/procedural), deterministic routing/formatting, strict evidence budgets, and strong token/latency results; mapping to shisad primitives
ANALYSIS-arxiv-2511.20857-evo-memory references/wei-evo-memory.md + references/papers/arxiv-2511.20857.pdf Benchmark deep dive emphasizing streaming task-sequence evaluation for experience reuse, plus refine/prune mechanisms and metrics (robustness, step efficiency) for shisad’s eval harness
ANALYSIS-arxiv-2512.10696-remember-me-refine-me references/cao-remember-me-refine-me.md + references/papers/arxiv-2512.10696.pdf System deep dive emphasizing procedural memory distillation + scenario-aware reuse + utility-based refinement/pruning; mapping to shisad procedural tier + versioned invalidation vs delete
ANALYSIS-arxiv-2512.12686-memoria references/sarin-memoria.md + references/papers/arxiv-2512.12686.pdf System deep dive emphasizing persona KG + session summaries with recency-weighted retrieval; highlights missing governance/versioning primitives needed for shisad
ANALYSIS-arxiv-2512.12818-hindsight references/latimer-hindsight.md + references/papers/arxiv-2512.12818.pdf System deep dive emphasizing retain/recall/reflect with four-network memory (facts/experiences/observations/beliefs), token-budgeted multi-channel retrieval fusion, and belief confidence updates; key shisad mapping
ANALYSIS-arxiv-2601.01885-agentic-memory references/yu-agentic-memory.md + references/papers/arxiv-2601.01885.pdf RL deep dive emphasizing unified LTM+STM memory ops as tool actions, 3-stage training curriculum, step-wise GRPO credit assignment, and implications for shisad’s future learned memory policies
ANALYSIS-arxiv-2601.02163-evermemos references/hu-evermemos.md + references/papers/arxiv-2601.02163.pdf System deep dive emphasizing MemCell→MemScene consolidation lifecycle, user profile/foresight, and sufficiency-verified scene-guided retrieval; mapping to shisad consolidation roadmap
ANALYSIS-arxiv-2601.02845-timem references/li-timem.md + references/papers/arxiv-2601.02845.pdf System deep dive emphasizing temporal-hierarchical consolidation (TMT), query-complexity recall planning/gating, and the accuracy–token frontier; mapping to shisad temporal tiers
ANALYSIS-arxiv-2601.06377-himem references/zhang-himem.md + references/papers/arxiv-2601.06377.pdf System deep dive emphasizing Episode Memory + Note Memory hierarchy, note-first β€œbest-effort” retrieval w/ sufficiency checks, and conflict-aware reconsolidation; mapping to shisad eventβ†’knowledge tiers + versioned updates
ANALYSIS-arxiv-2512.24695-nested-learning references/behrouz-nested-learning.md + references/papers/arxiv-2512.24695.pdf Conceptual deep dive on multi-timescale β€œcontinuum memory” and consolidation dynamics; mapping to shisad tiered memory + versioned corrections
ANALYSIS-arxiv-2512.24601-recursive-language-models references/zhang-recursive-language-models.md + references/papers/arxiv-2512.24601.pdf Architecture deep dive emphasizing RLM-style programmatic reading/compilation over arbitrarily long evidence stores (REPL + recursion + sub-calls), with implications for shisad sandboxed compilation traces and cost tail management
ANALYSIS-arxiv-2502.00592-m-plus references/wang-m-plus.md + references/papers/arxiv-2502.00592.pdf Architecture deep dive emphasizing latent-space long-term memory tokens + co-trained retrieval for >160k retention, with mapping to shisad’s external evidence-first memory and retrieval diagnostics
ANALYSIS-arxiv-2503.03704-minja references/dong-minja.md + references/papers/arxiv-2503.03704.pdf Security deep dive on query-only memory injection attacks; implications for write-policy, provenance/taint, isolation, and β€œdon’t store demonstrations” patterns
ANALYSIS-arxiv-2601.05504-memory-poisoning-attack-defense references/sunil-memory-poisoning-attack-defense.md + references/papers/arxiv-2601.05504.pdf Security deep dive emphasizing ISR vs ASR under realistic memory conditions, and why trust-score sanitization can fail; concrete shisad hardening takeaways
ANALYSIS-arxiv-2407.04363-arigraph references/anokhin-arigraph.md + references/papers/arxiv-2407.04363.pdf System deep dive emphasizing episodic↔semantic memory linking, graph-structured retrieval for planning/exploration, and implications for shisad episode objects + provenance + correction semantics
ANALYSIS-arxiv-2501.00663-titans references/behrouz-titans.md + references/papers/arxiv-2501.00663.pdf Architecture deep dive emphasizing test-time-learning neural memory (surprise/momentum/forgetting), Titans MAC/MAG/MAL variants, and how to translate salience/decay ideas into shisad’s external memory framework
ANALYSIS-arxiv-2504.16754-hema references/ahn-hema.md + references/papers/arxiv-2504.16754.pdf System deep dive emphasizing dual memory (summary + vector store), explicit prompt budgeting, pruning/consolidation policies, and evaluation-rigor cautions for shisad adoption
ANALYSIS-arxiv-2506.21605-membench references/tan-membench.md + references/papers/arxiv-2506.21605.pdf Benchmark deep dive emphasizing multi-scenario (participant vs observer) and multi-level (factual vs reflective) evaluation, plus latency/capacity metrics and implications for shisad eval harnesses

Source Threads & Links

Source URL
@jumperz memory stack thread https://x.com/jumperz/status/2024841165774717031
@joelhooks ADR tweet https://x.com/joelhooks/status/2024947701738262773
joelclaw ADR-0077 https://joelclaw.com/adrs/0077-memory-system-next-phase
@drag88 article https://x.com/drag88/status/2022551759491862974
supermemory docs https://supermemory.ai/docs
supermemory repo https://github.com/supermemoryai/supermemory
mempalace repo https://github.com/milla-jovovich/mempalace
karta repo https://github.com/rohithzr/karta

File Tree

agentic-memory/
β”œβ”€β”€ README.md                          ← this file
β”œβ”€β”€ ANALYSIS.md                         ← synthesis + comparison
β”œβ”€β”€ ANALYSIS-academic-industry.md       ← academic/industry synthesis
β”œβ”€β”€ ANALYSIS-jumperz-agent-memory-stack.md
β”œβ”€β”€ ANALYSIS-joelhooks-adr-0077-memory-system-next-phase.md
β”œβ”€β”€ ANALYSIS-coolmanns-openclaw-memory-architecture.md
β”œβ”€β”€ ANALYSIS-drag88-agent-output-degradation.md
β”œβ”€β”€ ANALYSIS-versatly-clawvault.md
β”œβ”€β”€ ANALYSIS-vstorm-memv.md
β”œβ”€β”€ ANALYSIS-mira-OSS.md
β”œβ”€β”€ ANALYSIS-codex-memory.md
β”œβ”€β”€ ANALYSIS-google-always-on-memory-agent.md
β”œβ”€β”€ ANALYSIS-supermemory.md
β”œβ”€β”€ ANALYSIS-karta.md               ← Karta: Rust agentic memory library with dream engine
β”œβ”€β”€ ANALYSIS-mempalace.md           ← not in ANALYSIS.md (claims-vs-code issues); see REVIEWED.md
β”œβ”€β”€ REVIEWED.md                        ← triage log (examined but not promoted to ANALYSIS)
β”œβ”€β”€ PUNCHLIST-academic-industry.md     ← tracking checklist for paper deep dives
β”œβ”€β”€ templates/                         ← templates for paper analyses/summaries
β”‚
β”œβ”€β”€ references/                        ← summarized reference docs (markdown w/ frontmatter)
β”‚   β”œβ”€β”€ 1-full-agent-memory-build.jpg  ← jumperz card 1: memory storage
β”‚   β”œβ”€β”€ 2-feeds-into.jpg               ← jumperz card 2: memory intelligence
β”‚   β”œβ”€β”€ jumperz-agent-memory-stack.md
β”‚   β”œβ”€β”€ joelhooks-adr-0077-memory-system-next-phase.md
β”‚   β”œβ”€β”€ coolmanns-openclaw-memory-architecture.md
β”‚   β”œβ”€β”€ drag88-agent-output-degradation.md
β”‚   └── versatly-clawvault.md
β”‚   β”œβ”€β”€ hu-evermembench.md
β”‚   β”œβ”€β”€ li-locomoplus.md
β”‚   β”œβ”€β”€ maharana-locomo.md
β”‚   β”œβ”€β”€ wu-longmemeval.md
β”‚   β”œβ”€β”€ chhikara-mem0.md
β”‚   └── papers/                        ← archived PDFs + text snapshots
β”‚       β”œβ”€β”€ README.md
β”‚       β”œβ”€β”€ arxiv-*.pdf
β”‚       └── arxiv-*.md
β”‚
└── vendor/                            ← cloned source repos
    β”œβ”€β”€ mira-OSS/                      ← github.com/taylorsatula/mira-OSS (snapshot, AGPLv3)
    β”‚   β”œβ”€β”€ README.md
    β”‚   β”œβ”€β”€ CLAUDE.md                  ← project guide (architecture, patterns, principles)
    β”‚   β”œβ”€β”€ main.py                    ← FastAPI entry point
    β”‚   β”œβ”€β”€ cns/                       ← Central Nervous System (conversation orchestration)
    β”‚   β”‚   β”œβ”€β”€ api/                   ← FastAPI endpoints (chat, actions, data, health)
    β”‚   β”‚   β”œβ”€β”€ core/                  ← Domain models (Continuum, Message, Events)
    β”‚   β”‚   β”œβ”€β”€ services/              ← Orchestrator, subcortical, summary, collapse handler
    β”‚   β”‚   └── infrastructure/        ← Repositories, Valkey cache, unit of work
    β”‚   β”œβ”€β”€ lt_memory/                 ← Long-term memory system
    β”‚   β”‚   β”œβ”€β”€ scoring_formula.sql    ← Multi-factor activity-day sigmoid importance scoring
    β”‚   β”‚   β”œβ”€β”€ models.py             ← Memory, Entity, ExtractedMemory, link types
    β”‚   β”‚   β”œβ”€β”€ hybrid_search.py      ← BM25 + pgvector with RRF
    β”‚   β”‚   β”œβ”€β”€ proactive.py          ← Dual-path retrieval (similarity + hub discovery)
    β”‚   β”‚   β”œβ”€β”€ hub_discovery.py      ← Entity-driven memory retrieval via pg_trgm
    β”‚   β”‚   └── processing/           ← Extraction, consolidation, entity GC pipelines
    β”‚   β”œβ”€β”€ working_memory/           ← System prompt composition via trinkets
    β”‚   β”œβ”€β”€ tools/                    ← Self-registering tool framework (11 built-in)
    β”‚   β”œβ”€β”€ config/                   ← Pydantic config + prompt templates
    β”‚   └── auth/                     ← WebAuthn + magic link authentication
    β”‚
    β”œβ”€β”€ openclaw-memory-architecture/  ← github.com/coolmanns/openclaw-memory-architecture
    β”‚   β”œβ”€β”€ README.md
    β”‚   β”œβ”€β”€ PROJECT.md
    β”‚   β”œβ”€β”€ CHANGELOG.md
    β”‚   β”œβ”€β”€ docs/
    β”‚   β”‚   β”œβ”€β”€ ARCHITECTURE.md        ← full 12-layer technical reference
    β”‚   β”‚   β”œβ”€β”€ knowledge-graph.md     ← graph search pipeline, benchmarks
    β”‚   β”‚   β”œβ”€β”€ context-optimization.md
    β”‚   β”‚   β”œβ”€β”€ embedding-setup.md
    β”‚   β”‚   β”œβ”€β”€ benchmark-process.md
    β”‚   β”‚   β”œβ”€β”€ benchmark-results.md
    β”‚   β”‚   β”œβ”€β”€ code-search.md
    β”‚   β”‚   └── COMPARISON.md
    β”‚   β”œβ”€β”€ schema/
    β”‚   β”‚   └── facts.sql              ← SQLite schema for knowledge graph
    β”‚   β”œβ”€β”€ scripts/                   ← init, seed, search, ingest, decay, benchmark, telemetry
    β”‚   β”œβ”€β”€ templates/                 ← starter files (active-context, gating-policies, etc.)
    β”‚   └── plugin-graph-memory/       ← OpenClaw plugin (JS)
    β”‚
    β”œβ”€β”€ karta/                         ← github.com/rohithzr/karta (submodule, MIT)
    β”‚   β”œβ”€β”€ Cargo.toml                ← workspace: karta-core + karta-cli
    β”‚   β”œβ”€β”€ crates/
    β”‚   β”‚   └── karta-core/           ← Core engine (~6.7K LOC Rust)
    β”‚   β”‚       β”œβ”€β”€ src/
    β”‚   β”‚       β”‚   β”œβ”€β”€ note.rs       ← MemoryNote, Provenance, NoteStatus, AtomicFact, Episode, EpisodeDigest
    β”‚   β”‚       β”‚   β”œβ”€β”€ write.rs      ← Write path: index, link, evolve, foresight, facts
    β”‚   β”‚       β”‚   β”œβ”€β”€ read.rs       ← Read path: classify, search, traverse, rerank, synthesize
    β”‚   β”‚       β”‚   β”œβ”€β”€ rerank.rs     ← Jina/LLM/noop rerankers
    β”‚   β”‚       β”‚   β”œβ”€β”€ dream/        ← Dream engine: 7 inference types
    β”‚   β”‚       β”‚   β”œβ”€β”€ store/        ← LanceDB + SQLite implementations
    β”‚   β”‚       β”‚   └── llm/          ← Provider trait + OpenAI + mock + prompts
    β”‚   β”‚       └── tests/            ← eval, beam_100k, bench_beam (~3.8K LOC)
    β”‚   β”œβ”€β”€ findings.md               ← BEAM 100K detailed failure analysis
    β”‚   └── plan.md                   ← Experiment plan targeting 90%+
    β”‚
    β”œβ”€β”€ always-on-memory-agent/        ← GoogleCloudPlatform/generative-ai (official ADK sample)
    β”‚   β”œβ”€β”€ agent.py                  ← ADK multi-agent daemon (ingest/consolidate/query)
    β”‚   β”œβ”€β”€ dashboard.py              ← Streamlit UI
    β”‚   └── docs/                     ← Logo/architecture assets
    β”‚
    β”œβ”€β”€ memv/                          ← github.com/vstorm-co/memv
    β”‚   β”œβ”€β”€ README.md
    β”‚   β”œβ”€β”€ CHANGELOG.md
    β”‚   β”œβ”€β”€ pyproject.toml             ← PyPI: memvee, v0.1.0
    β”‚   β”œβ”€β”€ docs/                      ← docs site (MkDocs)
    β”‚   β”œβ”€β”€ src/
    β”‚   β”‚   └── memv/                  ← segmentation, extraction, validity, retrieval, storage
    β”‚   └── tests/
    β”‚
    β”œβ”€β”€ supermemory/                    ← github.com/supermemoryai/supermemory (lean subset: schemas, SDK, MCP, arch docs)
    β”‚   β”œβ”€β”€ LICENSE
    β”‚   β”œβ”€β”€ README.md                  ← provenance + open-source vs hosted-backend split
    β”‚   β”œβ”€β”€ packages/
    β”‚   β”‚   β”œβ”€β”€ validation/            ← Zod schemas (data model definitions)
    β”‚   β”‚   β”‚   β”œβ”€β”€ schemas.ts
    β”‚   β”‚   β”‚   └── api.ts
    β”‚   β”‚   β”œβ”€β”€ lib/
    β”‚   β”‚   β”‚   β”œβ”€β”€ api.ts             ← reveals backend dependency (api.supermemory.ai)
    β”‚   β”‚   β”‚   └── similarity.ts      ← client-side cosine sim (visualization only)
    β”‚   β”‚   └── tools/src/shared/
    β”‚   β”‚       └── memory-client.ts   ← SDK client (profile search, prompt formatting)
    β”‚   β”œβ”€β”€ apps/mcp/src/
    β”‚   β”‚   └── server.ts              ← MCP server (memory/recall/whoAmI tools)
    β”‚   └── skills/supermemory/references/
    β”‚       └── architecture.md        ← claimed design (558 lines)
    β”‚
    └── clawvault/                     ← github.com/Versatly/clawvault
        β”œβ”€β”€ README.md
        β”œβ”€β”€ PLAN.md                    ← issue #4: ledger, reflect, replay, archive
        β”œβ”€β”€ CHANGELOG.md
        β”œβ”€β”€ SKILL.md
        β”œβ”€β”€ package.json               ← npm: clawvault, v2.6.1
        β”œβ”€β”€ src/
        β”‚   β”œβ”€β”€ commands/              ← archive, context, inject, observe, reflect, replay, wake, sleep, task, project, ...
        β”‚   β”œβ”€β”€ observer/              ← compressor, reflector, router, session-watcher
        β”‚   β”œβ”€β”€ lib/                   ← vault, memory-graph, ledger, observation-format, session-utils
        β”‚   └── cli/
        β”œβ”€β”€ bin/                       ← CLI entry + command registration modules
        β”œβ”€β”€ hooks/                     ← OpenClaw hook handler
        β”œβ”€β”€ dashboard/                 ← web dashboard (vault parser, graph diff)
        β”œβ”€β”€ schemas/
        β”œβ”€β”€ scripts/
        β”œβ”€β”€ templates/
        └── tests/

Key Themes Across Sources

  • Phased build order matters: Core memory first (write/read/decay), reliability second (dedup/maintenance/recovery), intelligence last (graphs/trust/cross-agent). Building out of order amplifies flaws.
  • Tiered retrieval: Summary files first (fast, cheap), vector search fallback (thorough, expensive). Don't vector-search everything.
  • Score decay: final_score = relevance Γ— exp(-Ξ» Γ— days) β€” recency-weighted relevance is universal across all architectures.
  • Feedback loops: Echo/fizzle (track which injected memories get used), behavior loops (extract corrections as lessons), learning loops (convert expensive LLM checks into cheap static rules).
  • SQLite over hosted vector DBs: At current scales (1K-5K entries), SQLite + FTS5 + local embeddings outperforms hosted solutions on latency, cost, and operational simplicity.
  • Multi-agent convergence: Shared memory creates homogenization pressure. Workspace isolation + file routing guards help but don't fully solve it.
  • Vault index pattern: Single scannable manifest (one-line descriptions) β†’ load individual entries on demand. One file read instead of N.

Release History

VersionChangesUrgencyDate
0.0.0No release found β€” using repo HEADHigh4/11/2026
main@2026-04-11Latest activity on main branchHigh4/11/2026
main@2026-04-11Latest activity on main branchHigh4/11/2026
main@2026-04-11Latest activity on main branchHigh4/11/2026
main@2026-04-11Latest activity on main branchHigh4/11/2026
main@2026-04-11Latest activity on main branchHigh4/11/2026
main@2026-04-11Latest activity on main branchMedium4/11/2026
main@2026-04-11Latest activity on main branchMedium4/11/2026
main@2026-04-11Latest activity on main branchMedium4/11/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

tradingPaper-first SPY options validation platform with broker-backed scorecards, hard risk gates, paired-trade accounting, and live dashboards.main@2026-04-21
cookbookRecipes and resources for building, deploying, and fine-tuning generative AI with Fireworks.main@2026-04-21
Mini-o3🧠 Enhance visual search with Mini-o3, providing state-of-the-art multi-turn reasoning and easy-to-use training code for advanced AI applications.main@2026-04-21
pdf_oxideThe fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5Γ— faster than industry leaders, 100% pass rate on 3,830 PDFs. v0.3.37
llm-streamStream responses from OpenAI and Anthropic models with lightweight C++ tools for efficient large language model integration.main@2026-04-21