RASPUTIN Memory v0.9

A self-hosted long-term memory backend for AI agents. RASPUTIN stores conversations as overlapping windows and LLM-extracted facts in Qdrant, with Qwen3-Reranker reranking, BM25 keyword search via SQLite FTS5, per-question prompt routing, and native MCP support for Claude Code, Cursor, and any MCP-compatible client.

Production-grade long-term memory for AI agents:

MCP server for Claude Code, Cursor, Codex, and any MCP client (FastMCP 3.2, streamable-http)
LLM memory synthesis (/reflect) — retrieves memories and synthesizes coherent answers
Vector search (Qdrant) with two-lane retrieval (windows + facts)
BM25 keyword search (SQLite FTS5) with Reciprocal Rank Fusion
LLM-based fact extraction at ingest time
Qwen3-Reranker-0.6B foundation-model reranking (GPU, 0.99/0.0001 score separation)
Per-question prompt routing (inference/factual/temporal)
A-MAC quality gate on commits
Knowledge graph (FalkorDB) with entity extraction
142 tests, 30+ ablation experiments with scientific methodology

API server: tools/hybrid_brain.py — MCP server: tools/mcp/server.py

Architecture Overview

MCP Client (Claude Code / Cursor / any MCP client)
   │
   └─► tools/mcp/server.py (port 8808, FastMCP 3.2)
       6 tools: store, search, reflect, stats, feedback, commit_conversation
       │
       └─► HTTP proxy ─► tools/hybrid_brain.py (port 7777)

Memory Commit
   │
   ├─► A-MAC quality gate (relevance/novelty/specificity)
   ├─► 5-turn overlapping windows (stride 2)
   ├─► LLM fact extraction (Haiku or local model)
   ├─► Embedding (nomic-embed-text, 768d)
   └─► Persist to Qdrant

Search
   │
   ├─► Multi-Query Expansion
   ├─► Query Embedding (nomic-embed-text, 768d)
   │
   ├─► Lane 1: Window search (45 slots) ──┐
   ├─► Lane 2: Fact search (15 slots)   ──┤
   ├─► Lane 3: BM25 keyword (FTS5, 10)  ──┼─► RRF Fusion ─► Qwen3-Reranker-0.6B ─► Top-60 to LLM
   │                                       │
   └─► Answer Prompt Routing ──────────────┘
       (inference / factual / temporal)

Reflect (LLM Synthesis)
   │
   ├─► hybrid_search(query, limit=20)
   ├─► Format top memories as context
   ├─► LLM call (Anthropic or Ollama)
   └─► Coherent synthesized answer + source citations

Core components

API server: tools/hybrid_brain.py
MCP server: tools/mcp/server.py (thin HTTP proxy, FastMCP 3.2)
LLM synthesis: tools/brain/reflect.py
Fact extraction: tools/brain/fact_extractor.py
Cross-encoder reranker: tools/brain/cross_encoder.py
Maintenance jobs: tools/memory_decay.py, tools/memory_dedup.py

How It Compares

Feature	RASPUTIN	Mem0	Zep	LightRAG
MCP protocol support	✅ FastMCP 3.2	❌	❌	❌
LLM memory synthesis	✅ `/reflect`	❌	❌	❌
Vector search	✅ Qdrant	✅	✅	✅
BM25 keyword search	✅ SQLite FTS5 + RRF	❌	❌	❌
LLM fact extraction	✅	❌	❌	❌
Two-lane retrieval	✅ windows + facts	❌	❌	❌
Foundation-model reranking	✅ Qwen3-Reranker-0.6B (GPU)	❌	❌	❌
LLM quality gate	✅ A-MAC	❌	❌	❌
Contradiction detection	✅	❌	❌	❌
Self-hosted / no vendor lock	✅	✅	❌ (SaaS)	✅

Benchmarks

Full-dataset, fully-disclosed evaluation. All numbers below are from the complete 10-conversation LoCoMo dataset (1986 questions), not a cherry-picked subset. Methodology, judge prompts, and all 30+ experiment records are public in this repository.

Evaluated on LoCoMo (ACL 2024). Full 10-conversation dataset (1986 QA pairs). Two benchmark modes: production (Haiku answers, strict judge — measures retrieval quality) and compare (Haiku answers, generous judge — field-comparable).

LoCoMo Full 10-Conv (v0.9 — current)

Mode	Non-adversarial	Questions
Compare (field-comparable)	77.7%	1540
Production (retrieval signal)	74.2%	1540

Category	Production	Compare	Questions	Notes
Open-domain	84.8%	83.2%	841	+2.9pp from v0.8
Temporal	71.3%	75.4%	321	+6.5pp from v0.8
Single-hop	54.3%	68.1%	282	+17.1pp from v0.8 — largest gain from Qwen3 CE
Multi-hop	49.0%	64.6%	96	−6.2pp production from v0.8 (see note)
Adversarial	13.0%	23.1%	446	Not an optimization target

Wide retrieval pool option: Set BENCH_LANE_WINDOWS=75 BENCH_LANE_FACTS=25 for single-hop-heavy workloads. Trades open-domain (−1.2pp) for single-hop (+4.2pp, 58.5%). Overall stays at 74.1%. See Retrieval Pool Tuning below.

Multi-hop note: Multi-hop decreased in production mode (55.2% → 49.0%) because the Qwen3 reranker is better at ranking, which helps ranking-bound categories (single-hop +12.8pp, open-domain +3.0pp) but slightly hurts retrieval-bound categories where the gold content was never in the candidate pool. The compare mode number (64.6%) is higher because the generous judge credits partial answers. Our retrieval analysis shows 86% of multi-hop failures are retrieval misses (gold not in any chunk), not ranking failures — the reranker can't fix what retrieval didn't find.

What's Been Tested (30+ Experiments)

Experiment	Result	Status
Qwen3-Reranker-0.6B	+4.5pp production, +8.6pp compare	✅ Shipped (v0.9)
BM25 FTS5 + RRF fusion	+0.6pp with Qwen3 CE	✅ Shipped (v0.9)
Wide retrieval pool (75w+25f)	Single-hop +4.2pp, open-dom −1.2pp	✅ Option (v0.9)
Prompt routing (inference/factual/temporal)	+1.6pp full 10-conv	✅ Shipped (v0.8)
Two-lane search (windows + facts)	+6.5pp overall	✅ Shipped (v0.7)
Cross-encoder reranking	Essential at two-lane	✅ Shipped (v0.7)
Windows-only chunking (w5s2)	+5.2pp	✅ Shipped (v0.7)
Pipeline strip (700→427 lines)	0pp change, cleaner code	✅ Shipped (v0.8)
BM25 with L-6 CE (3 variants)	−14pp to −28pp	❌ CE too weak to filter
Consolidation (6 variants)	Net negative, all configs	❌ Parked
Graph expansion (kNN links)	−4.4pp	❌ Parked
Entity search (3 variants)	−10pp to −14pp	❌ Parked
CE L-12 cross-encoder	−12.6pp single-hop	❌ Reverted
Compare mode with gpt-4o-mini answers	−10.8pp vs Haiku	❌ Haiku is better
Embedding upgrades (Qwen3 768d, 4096d)	0pp or worse	❌ No improvement

Full experiment records in experiments/.

On Benchmark Methodology

Published LoCoMo scores across memory systems are not directly comparable. Each system measures something different, uses different models, and reports under different conditions.

What varies across systems:

Variable	Effect on Score	Example
Answer generation model	GPT-4o vs Haiku: ~20pp difference	A strong model rescues poor retrieval
Judge prompt leniency	"Be generous" vs neutral: ~5-10pp	Generous judges forgive vague answers
Context window size	60 chunks vs 10: ~15pp	More context means ranking doesn't matter
Metric type	Retrieval recall vs answer accuracy	Fundamentally different measurements

What each system actually measures:

System	Metric	What It Tests
MemPalace	Retrieval recall	Whether the right evidence was found (no answer generated, no LLM)
LoCoMo original	Token F1	Answer quality against gold standard (algorithmic, no LLM judge)
AMB/Hindsight	LLM judge accuracy	End-to-end: retrieval + answer + LLM evaluation
RASPUTIN	LLM judge accuracy	End-to-end with fixed, disclosed methodology
Memvid	LLM judge (claimed)	Methodology not published

MemPalace's 96.6% LongMemEval score, for instance, is a retrieval recall metric — it measures whether the system found the right passage, not whether it generated a correct answer. This is a valid and useful metric, but it is not comparable to answer-accuracy scores reported by other systems.

Similarly, systems that use GPT-4o or Claude Opus for answer generation are primarily measuring LLM capability, not retrieval quality. A strong model can extract the correct answer from a large, poorly-ranked context window — which is exactly what our ablation program proved: at 60-chunk context, the entire ranking pipeline (BM25, keyword boosts, entity boosts, Cohere reranking, cross-encoder reranking) contributes 0pp because the answer model compensates.

RASPUTIN's methodology is fully disclosed:

Production mode: Claude Haiku answers + strict judge (isolates retrieval quality)
Compare mode: Claude Haiku answers + generous judge (field-comparable baseline)
Judge model pinned to gpt-4o-mini-2024-07-18 (prevents version drift)
All benchmark code, judge prompts, and experiment results are in this repository

We report production-mode numbers as primary because they reflect actual retrieval quality. Compare-mode numbers are provided for rough context against other systems, with the caveat that methodology differences make direct comparison approximate at best.

For a standardized comparison, we recommend the Agent Memory Benchmark (AMB), which evaluates all systems under identical conditions with a published judge prompt.

System	Reported Score	Benchmark	Methodology
Hindsight	92.0%	LoCoMo	AMB harness, published methodology
Backboard	90.00%	LoCoMo	GPT-4.1, generous judge
MemMachine	84.87%	LoCoMo	Not published
RASPUTIN (compare)	77.7%	LoCoMo full 10-conv	Haiku answers, generous judge
Memobase	75.78%	LoCoMo	Not published
Zep	75.14%	LoCoMo	Not published
RASPUTIN (production)	74.2%	LoCoMo full 10-conv	Haiku answers, strict judge
mem0	66.88%	LoCoMo	Not published

† Only RASPUTIN and Hindsight publish their full evaluation methodology, judge prompts, and experiment data. Other scores are self-reported under undisclosed conditions. See On Benchmark Methodology below for why these numbers are not directly comparable.

Pipeline

nomic-embed-text (768d) → Two-lane search (windows + facts) + BM25 FTS5 → RRF fusion → Qwen3-Reranker-0.6B → Haiku → gpt-4o-mini judge

See benchmarks/README.md for how to run benchmarks and reproduce numbers. See experiments/ for the full ablation program and scientific record.

Quick Start

1) Infrastructure (Docker Compose)

docker compose up -d

This should start Qdrant and FalkorDB from the repository compose file.

2) Python setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-core.txt

3) Start API server

python3 tools/hybrid_brain.py

Server runs on http://127.0.0.1:7777 by default.

4) Start MCP server (optional — for Claude Code, Cursor, etc.)

pip install "fastmcp>=3.2.0"
python3 tools/mcp/server.py
# MCP server on http://127.0.0.1:8808/mcp

# Connect Claude Code:
claude mcp add --transport http rasputin http://localhost:8808/mcp

5) Smoke check

curl http://localhost:7777/health
curl "http://localhost:7777/search?q=test&limit=3"
curl -X POST http://localhost:7777/commit \
  -H 'Content-Type: application/json' \
  -d '{"text":"Rasputin memory test event happened on 2026-03-01.","source":"conversation"}'

Configuration Reference (`config/rasputin.toml`)

The runtime loader reads this TOML and allows env overrides (see tools/config.py).

`[server]`

host (string): bind host
port (int): API port

`[qdrant]`

url (string): Qdrant base URL
collection (string): active memory collection

`[graph]`

host (string): FalkorDB host
port (int): FalkorDB port
graph_name (string): graph key
disabled (bool): disable graph search path

`[embeddings]`

url (string): embedding endpoint
model (string): embedding model name
prefix_query (string): query embedding prefix
prefix_doc (string): document embedding prefix

`[reranker]`

url (string): reranker endpoint
timeout (int): timeout seconds
enabled (bool): enable rerank stage

`[amac]`

threshold (float): reject below this composite score
timeout (int): scoring timeout seconds
model (string): model for admission scoring

`[scoring]`

decay_half_life_low (int)
decay_half_life_medium (int)
decay_half_life_high (int)

`[constraints]`

enabled (bool): enable implicit constraint extraction at commit time
model (string): LLM model for constraint extraction
timeout (int): extraction timeout seconds

`[reflect]`

provider (string): LLM provider for synthesis (anthropic or ollama)
model (string): model name (default claude-haiku-4-5-20251001)
max_tokens (int): max tokens for synthesized answer (default 1000)

`[entities]`

known_entities_path (string): entity dictionary JSON path

Retrieval Pool Tuning

The default retrieval pool fetches 45 windows + 15 facts + 10 BM25 keywords = 70 candidates per query variant. The Qwen3-Reranker selects the best 60 for the answer model.

For workloads dominated by single-hop factual questions (e.g. "What is Alice's job?", "Where does Bob live?"), a wider pool improves accuracy by giving the reranker more candidates to choose from:

export BENCH_LANE_WINDOWS=75   # default: 45
export BENCH_LANE_FACTS=25     # default: 15

Config	Overall	Single-hop	Open-domain	Temporal
Default (45w+15f)	74.2%	54.3%	84.8%	71.3%
Wide (75w+25f)	74.1%	58.5%	83.6%	70.4%

The wider pool trades ~1pp open-domain for ~4pp single-hop. Use it when your users ask specific factual questions more than broad open-domain questions. The overall accuracy is unchanged.

Both configs use the same Qwen3-Reranker and BM25 FTS5 pipeline — only the initial candidate count differs.

API Reference

All responses are JSON.

`GET /health`

Returns service health and component status.

curl http://localhost:7777/health

`GET /search?q=<query>&limit=<n>&source=<source>&expand=<bool>`

Hybrid retrieval endpoint.

curl "http://localhost:7777/search?q=payment+issue&limit=5"

`POST /search`

Body-based search variant.

curl -X POST http://localhost:7777/search \
  -H 'Content-Type: application/json' \
  -d '{"query":"project timeline","limit":5,"expand":true}'

`POST /commit`

Commits memory after quality and duplicate checks.

curl -X POST http://localhost:7777/commit \
  -H 'Content-Type: application/json' \
  -d '{"text":"Vendor contract moved to April 12 with revised pricing.","source":"conversation","importance":75}'

`GET /graph?q=<query>&limit=<n>&hops=<n>`

Direct graph lookup.

`GET /stats`

Qdrant and graph count summary.

`GET /amac/metrics`

A-MAC admission counters and rejection stats.

`GET /contradictions?limit=<n>`

Lists stored contradiction records.

`POST /proactive`

Returns proactive memory suggestions from recent context.

curl -X POST http://localhost:7777/proactive \
  -H 'Content-Type: application/json' \
  -d '{"messages":["We are discussing launch timelines"],"max_results":3}'

`POST /commit_conversation`

Commits multi-turn conversations with automatic window chunking.

curl -X POST http://localhost:7777/commit_conversation \
  -H 'Content-Type: application/json' \
  -d '{"turns":[{"speaker":"Alice","text":"I got a promotion today!"},{"speaker":"Bob","text":"Congratulations!"}],"source":"conversation","window_size":5,"stride":2}'

`POST /reflect`

Retrieves relevant memories and synthesizes a coherent answer via LLM.

curl -X POST http://localhost:7777/reflect \
  -H 'Content-Type: application/json' \
  -d '{"q":"What do we know about the auth service?","limit":20}'

Returns {"answer": "...", "sources": [...], "search_elapsed_ms": ..., "reflect_model": "..."}.

`POST /feedback`

Updates retrieval usefulness signal.

curl -X POST http://localhost:7777/feedback \
  -H 'Content-Type: application/json' \
  -d '{"point_id":123,"helpful":true}'

Development Guide

Local workflow

# lint
ruff check .

# type check
mypy tools/hybrid_brain.py tools/bm25_search.py --ignore-missing-imports

# unit tests (default suite)
pytest tests/ -k "not integration" -v

# integration tests (Qdrant required)
pytest tests/test_integration.py -v

Adding features safely

Add/update tests in tests/
Keep API behavior backward-compatible where possible
Prefer config via config/rasputin.toml + env overrides
Validate with lint + mypy + tests before commit

Testing Instructions

Unit tests

pytest tests/ -k "not integration" -v

Integration tests

pytest tests/test_integration.py -v

Coverage

pytest tests/ --cov=tools --cov-report=term-missing

Coverage threshold is configured in pyproject.toml (fail_under = 53).

Test breakdown: 106 core pipeline + 22 MCP server proxy + 14 reflect module = 142 tests.

Version Notes

v0.9.0

Qwen3-Reranker-0.6B: Foundation-model reranker replacing ms-marco-MiniLM-L-6-v2 — 0.99 vs 0.0001 score separation (+4.5pp production, +8.6pp compare)
BM25 keyword search: SQLite FTS5 in-memory sidecar with Reciprocal Rank Fusion (+0.6pp) — first positive BM25 result, enabled by the stronger reranker
Production: 74.2% non-adv (+6.7pp from baseline), Compare: 77.7% non-adv (+10.2pp)
Single-hop: 37.2% → 54.3% (+17.1pp) — largest category improvement from reranker upgrade
30+ experiments with scientific methodology (consolidation ×6, graph expansion, entity search ×3, BM25 ×4, CE A/B, compare mode variants)
Multi-hop retrieval analysis: 60% extraction misses, 40% embedding misses — no ranking failures
Reranker server supports both classic CrossEncoder and Qwen3 chat-template inference

v0.8.0

MCP server (tools/mcp/server.py): 6 tools via FastMCP 3.2 streamable-http transport — Claude Code, Cursor, Codex support
LLM memory synthesis (/reflect endpoint): search → format → LLM → coherent answer with source citations
tools/brain/reflect.py: Anthropic + Ollama LLM providers with automatic fallback
Docker service for MCP server + deployment docs
36 new tests (22 MCP + 14 reflect), total 142 tests
Full 10-conv validation: 69.1% non-adv (1986 questions, production mode)
Prompt routing: +16.7pp multi-hop, +3.9pp single-hop
Pipeline stripped from 700→427 lines (ablation-proven dead weight removed)
Cross-encoder GPU server for remote inference
Fact extraction module, consolidation engine, kNN link computation
21 experiments with scientific methodology
Consolidation tested (6 variants) and parked — net negative with dense-only retrieval

v0.7.0

Two-lane retrieval: windows (45 slots) + LLM-extracted facts (15 slots)
Cross-encoder reranker (ms-marco-MiniLM-L-6-v2, CPU)
Structured fact extraction via Claude Haiku at ingest
Windows-only chunking (individual turns proven to add 0pp)
Ablation-tested: BM25, keyword/entity/temporal boosts, MMR, Cohere reranker all proven 0pp
LoCoMo conv-0: 69.7% production, 72.4% compare (non-adversarial)

v0.5.0 — Search Quality Breakthrough

Keyword overlap boosting, entity focus scoring
recall@5: 0.67 → 0.82 (+22%), recall@10: 0.745 → 0.885 (+19%)

See CHANGELOG.md for full details.

License

MIT — see LICENSE.

Version	Changes	Urgency	Date
v0.9.1	## What's New in v0.9.1 ### Semantic kNN Graph Expansion (Experimental) - Gated behind `KNN_LINKS=1` (off by default) - At ingest: each fact linked to top-30 similar existing facts (cosine >= 0.6) via Qdrant payload `similar_ids` - At search: fact-lane seeds expanded through links before CE reranking (capped at 10 expansions) - Architectural parity with Hindsight's `link_expansion_retrieval.py` - Full 10-conv benchmark: 72.1% non-adv (−2.1pp from baseline) — useful for graph-traversal workloads	High	4/16/2026
v0.9.0	## Qwen3-Reranker + BM25 FTS5 + Prompt Routing Production: 74.2% non-adv (+6.7pp from baseline). Compare: 77.7% non-adv (+10.2pp from baseline). Full 10-conversation LoCoMo evaluation (1986 questions). 30+ documented experiments. ### Benchmark Results \| Category \| Production \| Compare \| Questions \| Δ from v0.8 \| \|----------\|-----------\|---------\|-----------\|-------------\| \| Overall non-adv \| 74.2% \| 77.7% \| 1540 \| +5.1pp / +8.6pp \| \| Open-domain \| 84.8% \| 83.2% \| 841 \| +3	High	4/13/2026
v0.8.0	## Full 10-Conversation LoCoMo Validation 69.1% non-adversarial (1986 questions, production mode). 21 documented experiments with scientific methodology. ### Benchmark Results (LoCoMo full 10-conv, production mode) \| Category \| Accuracy \| Questions \| Notes \| \|----------\|----------\|-----------\|-------\| \| Open-domain \| 81.1% \| 841 \| Rock solid \| \| Temporal \| 66.4% \| 321 \| 61% of failures are generation, not retrieval \| \| Multi-hop \| 55.2% \| 96 \| +16.7pp from prompt routing \| \| Single-hop \|	High	4/10/2026
v0.7.0	## #1 on LoCoMo — 91.36% Three benchmarks, one pipeline. All results are reproducible from the scripts in `benchmarks/`. ### Benchmark Results \| Benchmark \| Score \| Questions \| Venue \| \|-----------\|-------\|-----------\|-------\| \| LoCoMo \| 91.36% (#1) \| 1,986 \| ACL 2024 \| \| LongMemEval \| 89.40% \| 500 \| ICLR 2025 \| \| FRAMES \| 50.4% \| 824 \| Google 2024 \| #### LoCoMo Leaderboard \| Rank \| System \| Accuracy \| \|------\|--------\|----------\| \| #1 \| RASPUTIN Memory v0.7	High	4/3/2026
v0.6.0	## RASPUTIN Memory v0.6.0 — #2 on LoCoMo (89.81%) ### LoCoMo Benchmark Results \| Rank \| System \| Accuracy \| \|------\|--------\|----------\| \| 🥇 \| Backboard \| 90.00% \| \| 🥈 \| RASPUTIN \| 89.81% \| \| 🥉 \| Memvid \| 85.70% \| \| 4 \| MemMachine \| 84.87% \| \| 5 \| Memobase \| 75.78% \| \| 6 \| Zep \| 75.14% \| \| 7 \| mem0 \| 66.88% \| Config: nomic-embed-text (768d) → Qdrant top-60 → Claude Opus (answer gen) → GPT-4o-mini (judge) Full changelog: https://github.com/jcartu/rasputin-memory/blob/main/CHANG	Medium	4/2/2026
v0.5.0	## Search Quality Breakthrough Keyword overlap boosting + entity-aware scoring push recall well past mem0 benchmarks. ### Benchmarks \| Metric \| v0.4.0 \| v0.5.0 \| Change \| \|--------\|--------\|--------\|--------\| \| recall@5 \| 0.67 \| 0.82 \| +22% \| \| recall@10 \| 0.745 \| 0.885 \| +19% \| \| MRR@10 \| 0.56 \| 0.68 \| +21% \| \| Entity recall@5 \| 0.20 \| 0.63 \| +215% \| \| Decay recall@5 \| 0.23 \| 0.40 \| +74% \| \| Contradiction recall@5 \| 0.48 \| 0.96 \| +100% \| Beats mem0 LOCOMO benchmark (	Medium	4/1/2026
v0.4.0	## What's Changed ### Architecture - Modular codebase: 1,800-line `hybrid_brain.py` split into 11 focused modules in `brain/` package - Unified scoring: 5 competing source-weight systems replaced by single `scoring_constants.py` - Shared utilities: Extracted locking, batch Qdrant scroll, date parsing — eliminated 4x copy-paste patterns ### Retrieval Quality - Language-agnostic pipeline: Deleted English-only keyword routing, stop words, and supersedes token checks — the embeddin	Medium	4/1/2026
v0.3.0	## What's in v0.3.0 ### BEIR Benchmarks Real retrieval evaluation on BEIR datasets (SciFact, NFCorpus) using the full local infrastructure: - SciFact: Hybrid NDCG@10=0.8336 vs Vector-only 0.8230 (+0.011) - NFCorpus: Vector baseline NDCG@10=0.371 (strong MoE embeddings) - Full reproduction script: `benchmarks/run_beir.py` ### Ablation Study 4-stage pipeline contribution analysis: - RRF fusion boosts Recall@10 by +1.5% (SciFact) - Neural reranker improves MRR@10 by +2.3% (SciFact) - Gr	Medium	3/30/2026
v0.2.0	## What changed ### PII Scrub - Removed all personal data from code, docs, and examples (health metrics, personal names, specific locations, crypto references) - Replaced hardcoded `~/.openclaw/workspace` paths with environment variables (`WORKSPACE_PATH`, `ENTITY_GRAPH_PATH`, etc.) - Generalized example data in docstrings and documentation ### Architecture Fixes - Removed expansion maps from `memory_engine.py`: the hand-rolled keyword topic tables were architectural debt compensating for	Medium	3/30/2026
v0.1.0	Initial public release of the RASPUTIN Memory System. ## Features - 4-stage hybrid retrieval pipeline (Vector + BM25 + Graph + Reranker) - Multi-tenant agent isolation - Unified consolidation engine - A-MAC quality gate for memory commits - Docker Compose deployment - Dockerfile for standalone brain server - Predictive memory prefetch - STORM wiki generation from memory	Medium	3/30/2026

rasputin-memory

Description

README

RASPUTIN Memory v0.9

Architecture Overview

Core components

How It Compares

Benchmarks

LoCoMo Full 10-Conv (v0.9 — current)

What's Been Tested (30+ Experiments)

On Benchmark Methodology

Pipeline

Quick Start

1) Infrastructure (Docker Compose)

2) Python setup

3) Start API server

4) Start MCP server (optional — for Claude Code, Cursor, etc.)

5) Smoke check

Configuration Reference (config/rasputin.toml)

[server]

[qdrant]

[graph]

[embeddings]

[reranker]

[amac]

[scoring]

[constraints]

[reflect]

[entities]