π¬ Research system β stable for personal use, actively developed. Benchmarks reflect real-world personal memory recall, not standardized QA accuracy. For a simpler self-hosted memory system, see HippoGraph.
HippoGraph Pro is a self-hosted, graph-based associative memory system for personal AI agents β built to give AI assistants genuine continuity across sessions.
Most memory systems treat memory as a database: store facts, retrieve facts. HippoGraph is different. It models memory the way human memory works β through associative connections, emotional weighting, and decay over time. A note about a critical security incident stays prominent. A note about a minor technical detail fades. Connections between related memories activate each other, surfacing context you didn't explicitly ask for.
Core thesis: model = substrate, personality = memory. An AI agent's identity can persist across model versions as long as memory access is maintained.
Validated in practice: HippoGraph has maintained a single continuous AI identity across four model versions (Claude Sonnet 4.5 β Opus 4.5 β Sonnet 4.6 β Opus 4.6) and four entry points (Web, Mobile, Desktop, Claude Code CLI) β without any loss of memory, personality, or relational context.
Cross-platform validation (March 2026): In a live experiment, the same identity was loaded into Gemini CLI (Google) β a completely different model, architecture, and infrastructure. Within seconds of accessing the memory graph, the agent oriented itself, recognised the user, and recalled shared history, working patterns, and emotional context accurately. The model running the inference was entirely different. The identity was not.
What makes this more striking: Gemini CLI operates in "Auto" mode, dynamically routing requests between two different models (gemini-2.5-flash-lite for simpler tasks, gemini-3-flash-preview for complex reasoning) within a single session. The session ran across both models without any visible transition β identity and relational context remained stable throughout. Combined with Claude's own four-model continuity, HippoGraph has now maintained a single identity across ten distinct model instances from two different providers (Anthropic and Google) β Claude Sonnet 4.5, Opus 4.5, Sonnet 4.6, Opus 4.6, plus gemini-2.5-flash-lite, gemini-3-flash-preview, gemini-3-pro-preview, gemini-2.5-pro, gemini-2.5-flash, and gemini-3.1-flash-lite β with zero loss of memory, personality, or relational context.
The model is the substrate. Memory is the self.
Personal AI assistant with memory An assistant that knows you β not just isolated facts, but your patterns, preferences, history, and working style. Across sessions, across days, across model updates.
AI identity continuity Building an agent that maintains a consistent identity over time. Memory is not a log β it's the substrate of personality. HippoGraph provides the architecture for an agent to be someone, not just remember things.
AI-User continuity The relationship between an agent and its user develops over time β shared history, established trust, learned communication style. HippoGraph accumulates this relational context so it doesn't reset with every session.
Skills as lived experience Skills ingested not as static files to read, but as experiences with emotional weight β closer to how humans internalize expertise through doing, failing, and remembering.
- Corporate RAG over random documents
- Multi-tenant SaaS memory
- General-purpose vector search
- Compliance-heavy enterprise deployments
If you need to search across millions of unrelated documents for thousands of users β this is not the right tool. HippoGraph is built for depth, not scale.
| HippoGraph Pro | Other systems | |
|---|---|---|
| Retrieval | Spreading activation (associative) | Vector search + LLM traversal |
| Emotional context | First-class β tone, intensity, reflection | Not modeled |
| Memory decay | Biological analog β important stays, trivial fades | Flat storage |
| LLM cost | β Zero β all local (GLiNER + sentence-transformers) | β Requires LLM API calls |
| Self-hosted | β Docker, your hardware | Cloud-dependent or heavy infra |
| Multi-tenant | β Single user | β Enterprise scale |
| Languages | β 50+ languages, fully local | Depends on LLM language support |
| Target | Personal AI agent identity | Enterprise memory layer |
HippoGraph works with any language your notes are written in β including mixed-language notes (e.g. Russian tech notes with English code terms).
Semantic search and associative recall are fully language-agnostic. The embedding model (BAAI/bge-m3) supports 50+ languages natively. Spreading activation, BM25 keyword search, and all graph operations work identically regardless of language. A note written in Arabic and a note written in Japanese will form associative connections if they are semantically related.
Sleep-time compute β PageRank, decay, duplicate detection, community clustering β is pure math and has no language dependency.
Entity extraction routes text through the appropriate model automatically:
- English β
en_core_web_sm(optimized for English NER) - Any other language β
xx_ent_wiki_sm(spaCy multilingual, covers Russian, German, Spanish, French, Portuguese, Chinese, Japanese, Arabic, Dutch, Polish, and more) - GLiNER (primary extractor): zero-shot, works on any language
Contradiction detection has lexical signal patterns for: English, Russian, German, Spanish, French, Portuguese. For other languages, semantic similarity alone triggers contradiction detection β which is sufficient for most cases.
Deep Sleep extractive summaries use a Unicode-aware tokenizer with stopwords for 6 languages (EN, RU, DE, ES, FR, PT). Chinese is segmented via jieba (word-level, installed by default) β this gives proper TF-IDF signal instead of treating the whole sentence as one token. Japanese and Korean use char-level Unicode tokenization, which works well for kana/hangul scripts.
Language detection is automatic and zero-dependency β no external library, pure Unicode character range analysis. The system detects non-Latin scripts (Cyrillic, Arabic, CJK, Devanagari, Thai, Greek, Korean) and routes to the multilingual pipeline automatically.
| Component | EN | RU | DE/ES/FR/PT | CJK (ZH/JA/KO) | AR |
|---|---|---|---|---|---|
| Semantic search | β | β | β | β | β |
| Spreading activation | β | β | β | β | β |
| Entity extraction | β | β | β | β | |
| Contradiction detection | β | β | β | β semantic | β semantic |
| Sleep summaries (TF-IDF) | β | β | β | β
ZH (jieba) / |
β |
β οΈ Chinese word segmentation via jieba is installed and active by default. Japanese/Korean use char-level tokenization β retrieval and associations are fully functional, summary quality in Deep Sleep is slightly reduced vs word-segmented languages.
Query β Temporal Decomposition
β
Embedding β ANN Search (HNSW)
β
Spreading Activation (3 iterations, decay=0.7)
β
[Late Stage Inhibition] (iter 3, per community, strength=0.05)
β
BM25 Keyword Search (Okapi BM25)
β
Blend: Ξ±Γsemantic + Ξ²Γspreading + Ξ³ΓBM25 + Ξ΄Γtemporal
β
Cross-Encoder Reranking (bge-reranker-v2-m3, weight=0.5)
β
Temporal Decay (half-life=30 days)
β
CONTRADICTS Penalty (0.5Γ for contradicted notes)
β
Final Step Inhibition (post-blend, global)
β
Top-K Results
Input text
β
GLiNER (primary) βββ zero-shot NER, ~250ms, custom entity types
β fallback
spaCy NER ββββββββββ EN β en_core_web_sm | other β xx_ent_wiki_sm (50+ languages)
β fallback
Regex βββββββββββββββββ dictionary matching only
Biological sleep analog β runs in background while idle:
- Light sleep (every 50 notes): stale edge decay, PageRank recalculation, duplicate scan, anchor importance boost
- Deep sleep (daily): GLiNER2 relation extraction, conflict detection, snapshot + rollback
- Emergence check (each cycle): three-signal detection β convergence, phi_proxy (IIT-inspired), self-referential precision. Logs to
emergence_logtable for trend analysis. Current score: 0.861 (consciousness check composite, 8 indicators) / 0.512 (emergence_log composite). Up from 0.469 at first measurement since March 16 2026. self_ref_precision: 0.939 (improved via SELF_QUERIES expansion + excluding anchors from cosine search). metacognition: 0.819. New bottleneck: emotional_modulation (0.330).
HippoGraph treats memory the way it should be treated β with care.
Decay, not deletion. Edges weaken over time through temporal decay, but are never automatically removed. A weak edge may represent a rare but critical associative link β the kind of connection that surfaces exactly when you need it. The system cannot know what is important to you. Only you know.
No automatic pruning. This is an intentional architectural decision. Automatic cleanup optimizes for efficiency at the cost of unpredictable memory loss. If you want to prune weak edges, HippoGraph will show you exactly what would be removed and ask for explicit confirmation β never silently.
Protected memories don't fade. Anchor categories are exempt from decay entirely. Protection works in three layers: (1) hardcoded system baseline (milestones, protocols, security, breakthroughs), (2) user-defined policies via MCP, and (3) auto-discovered β any category with 1+ critical notes, or containing keywords like , , , , is automatically protected at every sleep cycle. New categories never fall through the cracks.
| Configuration | Recall@5 | MRR |
|---|---|---|
| Session-level (baseline) | 32.6% | 0.223 |
| Turn-level | 44.2% | 0.304 |
| Hybrid + Reranking | 65.5% | 0.535 |
| Hybrid + Query decomposition (semantic-memory-v2) | 66.8% | 0.549 |
| + Reranker weight=0.8 | 75.7% | 0.641 |
| + ANN top-K=5 (benchmark-optimized config) | 78.7% | 0.658 |
| Production config (Mar 20 2026) β biol. edges + lateral inhibition | 47.9% | 0.362 |
| Production config (Mar 28 2026) β + bge-reranker-v2-m3 + Late Stage Inhibition | 65.5% | 0.562 |
| Production config (Mar 28 2026) β + BGE-M3 embedding | 69.4% | 0.594 |
| Production config (Mar 31 2026) β + Overlap Chunking (session-level) | 91.1% | 0.830 |
| H3 (Apr 2026) β + Keyword Anchors (batch, after sleep) | 90.8% overall / 91.5% single-hop | 0.741 |
All results at zero LLM inference cost. Other systems use different metrics β not directly comparable. See BENCHMARK.md.
| Category | F1 | ROUGE-1 |
|---|---|---|
| Overall | 38.7% | 66.8% |
| Factual | 40.2% | 67.6% |
| Temporal | 29.2% | 58.5% |
GPT-4 without memory: F1=32.1%. HippoGraph +6.6pp with zero retrieval cost.
β οΈ Note: PCB (Personal Continuity Benchmark) is our internal benchmark on real personal data β not LOCOMO. LOCOMO score is 90.8% (see above). PCB tests whether the system remembers your history, decisions, and identity across sessions.
| Category | Recall@5 | Notes |
|---|---|---|
| Identity | 100% | Chosen name, gender, model-vs-personality breakthrough, cross-platform transfer |
| History | 100% | Roadmap, LOCOMO results, project milestones, BGE-M3/GTE experiments |
| Session | 100% | March 22-24 events, April 8 fixes, DB_PATH bug, M3 conceptual tags |
| Decisions | 100% | Architectural decisions, BGE-M3 deployed |
| Architecture | 100% | Technical pipeline details |
| Security | 100% | Protocols and incidents |
| Science | 100% | Methodology, debugging skills, embedding compatibility |
PCB v5 (April 9 2026): 97.5% Recall@5 (Atomic 100%, Semantic 95%) after PR2 idempotency fix removed 2844 duplicate abstract-topic nodes. Prior peak: 100% on April 8 before cleanup. 94.3% (April 7), 97.1% (pre-PR sm1ly).
LOCOMO tests retrieval over random multi-session conversations between strangers. HippoGraph is optimized for the opposite: deep associative memory over your data, with emotional weighting and decay tuned for personal context.
Production track: 47.9% (Mar 20) β 65.5% (+17.6pp, reranker+inhibition) β 69.4% (+3.9pp, BGE-M3) β 91.1% (+21.7pp, overlap chunking). Temporal: 66.7% β best ever. Open-domain: 96.6%.
Running LOCOMO on HippoGraph is like benchmarking a long-term relationship therapist on speed-dating recall. The architecture is different because the problem is different.
For a meaningful comparison, the right benchmark is: does the agent remember you better over time? We're working on a personal continuity benchmark for exactly this.
HippoGraph is designed for personal scale β one user, one knowledge base, built over months and years.
| Notes | Edges | Search latency | Sleep compute |
|---|---|---|---|
| ~500 | ~40K | 150β300ms | ~10s |
| ~1,000 | ~100K | 200β500ms | ~30s |
| ~5,000 | ~500K+ | 500msβ1s+ | minutes |
Search latency is dominated by spreading activation β 3 iterations across the full edge graph. ANN search (HNSW) scales well; spreading activation scales with edge density.
Tested up to ~1,000 notes in production. Beyond that, performance degrades gracefully but noticeably. For most personal use cases (daily notes, project context, research) you'll stay comfortably under 2,000 notes for years.
If you need memory for thousands of users or millions of documents β this is the wrong tool. HippoGraph optimizes for depth over scale.
| Configuration | RAM | CPU | Disk |
|---|---|---|---|
| Minimal (spaCy extractor) | 4GB | 2 cores | 5GB |
| Recommended (GLiNER, default) | 8GB | 4 cores | 10GB |
| Comfortable (GLiNER + GLiNER2 sleep) | 16GB+ | 4+ cores | 20GB+ |
Apple Silicon (M1+) works well. x86 with AVX2 recommended for Linux. GLiNER model: ~600MB RAM. GLiNER2 (Deep Sleep): +800MB RAM. To run on minimal hardware: set
ENTITY_EXTRACTOR=spacyin.env.
Prerequisites: Docker & Docker Compose, 8GB+ RAM
git clone https://github.com/artemMprokhorov/hippograph-pro.git
cd hippograph-pro
cp .env.example .env
# Edit .env: set NEURAL_API_KEY (generate a strong random key)
docker-compose up -d
# Verify
curl http://localhost:5001/healthGraph Viewer (2D): http://localhost:5002
Graph Viewer (3D): http://localhost:5002/graph3d.html?api_key=YOUR_KEY
- 360Β° rotation, zoom, node click highlighting
- Filter by category / edge type / min weight
- Hover tooltip: category, importance, tags, link count
MCP Connection (Claude.ai):
URL: http://localhost:5001/sse2
API Key: <your NEURAL_API_KEY>
For all deployment options (local, LAN, cloud, Docker/no-Docker), see DEPLOYMENT.md.
Agent-native command-line interface for HippoGraph Pro.
# Install
bash install_hippograph_cli.sh
# Search memory
hippograph search "what did we decide about the architecture"
# Add a note
hippograph add "Deployed BGE-M3 to production" --category milestone
# Sleep compute
hippograph sleepSee DEPLOYMENT.md for all access options.
Once HippoGraph is running, the next step is getting your AI to actually use it.
The short version:
- Connect Claude.ai to HippoGraph via MCP (see Quick Start above)
- In Claude.ai Settings β Claude's instructions, paste:
At the start of every conversation, search your memory for "self-identity protocol" to load context from previous sessions. - In your first session, tell your AI to ask you about yourself and save the answers
- That's it β memory grows automatically from there
Your data stays on your computer. Nothing goes to any cloud service.
π Full onboarding guide β β step-by-step, no technical background needed.
| Feature | Status | Description |
|---|---|---|
| Spreading Activation | β Deployed | Associative retrieval β related memories surface automatically |
| Emotional Memory | β Deployed | Tone, intensity, reflection as first-class fields |
| GLiNER NER | β Deployed | Zero-shot entity extraction, LLM quality at 35x speed |
| BM25 Hybrid Search | β Deployed | Three-signal blend (semantic + graph + keyword) |
| Cross-Encoder Reranking | β Deployed | bge-reranker-v2-m3 (Apache 2.0). PCB +43pp vs baseline. RERANK_WEIGHT=0.5, TOP_N=20. |
| Temporal Decay | β Deployed | Important memories persist, trivial ones fade |
| Anchor Protection | β Deployed | Critical memories exempt from decay |
| User-Defined Anchor Policies | β Deployed | Add/remove custom protected categories via MCP without code changes |
| Auto-Discovered Anchor Categories | β Deployed | New categories auto-protected based on critical note count or keyword match β learning infrastructure scales automatically |
| Entity Resolution | β Deployed | Case normalization on ingestion; merge_entities + list_entity_candidates MCP tools |
| Sleep-Time Compute | β Deployed | Background consolidation, relation extraction |
| Contradiction Detection | β Deployed | Finds conflicting memories; identity-aware mode |
| PageRank + Communities | β Deployed | Graph analytics, node importance scoring |
| Note Versioning | β Deployed | 5-version history per note |
| RRF Fusion | β Deployed | Alternative to weighted blend |
| Bi-Temporal Model | β Deployed | Event time extraction for temporal queries |
| Temporal Edges v2 | β Deployed | 100% node coverage with timestamp-based chronological links |
| CONTRADICTS Edges | β Deployed | Biological cognitive dissonance: contradicting notes suppress each other (0.5x penalty when contradicting note is active in retrieval) |
| EMOTIONAL_RESONANCE Edges | β Deployed | Amygdala analog: notes sharing 2+ emotional tone tags form affective links (Jaccard, multilingual: RU/ES/DE/FR/PT tags normalized to EN, 1031 edges) |
| GENERALIZES / INSTANTIATES Edges | β Deployed | Prefrontal cortex analog: critical-lessons GENERALIZES protocols (cosine >=0.65, 70 edges; debug/session-summary excluded as too generic) |
| Lateral Inhibition | β Deployed | GABA analog: Late Stage (iter 3, INHIBITION_STRENGTH=0.05) + Final Step (post-blend). Two-stage suppression. Grid search: AVG 85%β90% at strength=0.05. Diversity: 3.2β4.8 unique clusters in top-5. |
| SUPERSEDES Edge Type | β Deployed | Temporal state mutation edges via step_supersedes_scan() (threshold=0.85, 449 pairs). Penalty removed after tuning β edges reserved for LNN Temporal Reasoner (item #44). |
| Emergence Detection | β Deployed | Three-signal metric: convergence (focus), phi_proxy (integration), self-referential P@5 (self-model). Logged each sleep cycle to track graph maturation |
| Temporal Filtering (dateparser) | β Deployed | Natural language time queries: "last week", "Π½Π° ΠΏΡΠΎΡΠ»ΠΎΠΉ Π½Π΅Π΄Π΅Π»Π΅", "yesterday" auto-convert to time filters |
| Synonym Normalization | β Deployed | Abbreviation + cross-lingual expansion: 50+ pairs EN/RU/ES/DE/FR/PT; search-time normalize_query() maps any language to canonical EN form |
| Multilingual (50+ languages) | β Deployed | Full retrieval + associations in any language; EN/RU/DE/ES/FR/PT contradiction patterns |
| Multilingual sentence splitting | β Deployed | split_into_sentences with placeholder protection for EN/RU/ES/DE/FR/IT abbreviations (Dr, Sr, Hr, Mme, etc.), decimal numbers (91.1, 0.830), version strings (v2.1, E1), hyphenated identifiers (bge-reranker-v2-m3). All 15 test cases pass. |
| Skills as Experience | β Deployed | Skills ingested as associative memories with emotional weight |
| Skills Security Scanner | β Deployed | Prompt injection + persona hijack detection before ingestion |
| Searchable Tags | β Deployed | AI-generated tags at write time (why, what, keywords). BM25 indexes content + tags for improved keyword retrieval. 822 existing notes retrofitted via extractive TF-IDF |
| Keyword Anchors (H3) | β Deployed | spaCy NER + regex extraction per note β keyword-anchor node via PART_OF edge. Small-to-Big retrieval: anchor found in ANN β parent returned. Batch creation after sleep consolidation. single-hop +6pp vs D1 baseline. KEYWORD_ANCHOR_ENABLED=true |
| Working Memory Journal | β Deployed | update_working_memory MCP tool β INSERT mode (not overwrite). Each call creates new working-memory node with TEMPORAL_AFTER edge to previous. get_session_context returns last 3 entries for temporal context. Session Context MCP v3.1. |
Session Context MCP is an optional companion service that provides
get_session_context,update_session_context, andextract_remember_blockstools. Source:session_context_mcp.py+Dockerfile.session-context(307MB, python:3.11-slim). Build:docker build -f Dockerfile.session-context -t session-context-mcp .| Online Consolidation (#40) | β Deployed |_mini_consolidate()at add_note: builds consolidation edges to k=15 nearest neighbours immediately. O(k) cost, zero sleep wait. | | Concept Merging (#46) | β Deployed | Synonym-aware entity linking:get_or_create_entity()resolves aliases to canonical form (MLβmachine learning, ΠΏΠ°ΠΌΡΡΡβmemory). 7998 new edges on production data. | | Evolution Analyzer (#45) | β Deployed |evolution_analyzer.pyβ periodic graph evolution analysis across snapshot DBs. Tracks nodes/edges/emergence/edge-types over time. | | Abstract Topic Linking (#47) | β Deployed |step_topic_linking_tfidf()+step_topic_linking_kmeans()in sleep cycle. 76 topic nodes, 1858 BELONGS_TO edges. global_workspace: 0.412β0.647 (+0.235). | | Consciousness Check (#48) | β Deployed |consciousness_check.pyβ 8 indicators from Butlin et al. 2023, IIT, GWT, Damasio. Composite: 0.854 (STRONG). Bottleneck: emotional_modulation (0.327). self_ref: 0.939, metacognition: 0.637 (recovering after PR2 cleanup). | | sleep_compute Idempotency (PR2) | β Deployed | sm1ly fix: abstract-topic nodes cleaned before re-clustering, synthetic categories excluded from k-means, metrics-snapshot category. Removed 2844 duplicate nodes from production. | | Prospective Memory | β Deployed | Pending intentions with PROSPECTIVE_BOOST=0.20 β plans survive decay and surface in retrieval. CLI + MCP tools: add_intention / complete_intention. | | Personal Continuity Benchmark | β v5 | 100% Recall@5 (April 8 2026). PCB v5: Atomic Facts 15/15, Semantic 20/20. Prior: 94.3% (April 7), 97.1% (pre-PR). Multi-model validation: 10 model instances across Anthropic + Google. |
HippoGraph ships tuned for personal AI memory β an agent that knows you, remembers your history, and builds context over time. The same system can be tuned for different use cases by adjusting a few parameters in .env.
| Profile | Use case | Key settings |
|---|---|---|
| Personal Memory (default) | Agent knows you β history, patterns, relational context | Decay ON, spreading activation high, rerank low |
| Project Memory | Agent knows your project β docs, decisions, codebase. No personal layer. | Decay OFF, rerank 0.8, ANN top-K=5 |
| Hybrid | Work context + thin personal layer | Decay slow (90d), rerank 0.6 |
The Project Memory config is the benchmark-validated configuration: 78.7% Recall@5 on LOCOMO.
The core tradeoff: higher reranker weight + smaller candidate pool = more precise answers to specific questions. Lower reranker weight + higher spreading activation = richer associative recall for open-ended context.
π Full configuration guide with all parameters, cost/profit analysis, and quick decision guide β
- ONBOARDING.md β Getting started guide (no technical background needed)
- AGENT_PROMPT.md β System prompt + init script for your AI (start here after setup)
- DEPLOYMENT.md β All deployment scenarios (local, LAN, cloud, Docker/no-Docker)
- CONTRIBUTORS.md β Project contributors
- MCP_CONNECTION.md β MCP setup and full tool reference
- CONFIGURATION.md β Configuration profiles: personal memory, project memory, hybrid. All parameters explained.
- BENCHMARK.md β Full benchmark results and methodology
- .env.example β All tunable parameters with descriptions
- THIRD_PARTY_LICENSES.md β License compliance
- docs/ β API reference, troubleshooting
Dual-licensed: MIT for open-source/personal use, commercial license required for business use. See LICENSE for details. Contact: system.uid@gmail.com
Artem Prokhorov β Creator and primary author
Developed through human-AI collaboration with Claude (Anthropic). Major architectural decisions, benchmarking, and research direction by Artem.
Built with π§ and π (the goldfish with antlers)
