freshcrate
Skin:/
Home > Databases > vektori

vektori

Memory that remembers the story not just the facts. Three layer sentence graph for AI agents -> Facts, Episodes, raw Sentences. One DB. Zero config.

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

Memory that remembers the story not just the facts. Three layer sentence graph for AI agents -> Facts, Episodes, raw Sentences. One DB. Zero config.

README

Vektori logo

Vektori

Memory that remembers the story, not just the facts.

GitHub ยท Issues ยท Docs

License PyPI Downloads Python StarsIssuesContributorsLast Commit

๐Ÿ‘‹ Questions, ideas, bugs โ†’ GitHub Issues ยท Discussions

If Vektori has been useful, a โญ goes a long way.


Why Vektori

Building agents that actually remember people is harder than it looks:

  • Facts aren't enough. Knowing a user prefers WhatsApp is different from knowing they've asked three times and are getting frustrated. Most systems give you the what, not the why or how it changed.
  • Patterns stay invisible. Spotting that someone's tone has been shifting across sessions requires more than point-in-time retrieval โ€” you need to see the trajectory.
  • Context overhead explodes. Stuffing raw conversation history into every prompt doesn't scale. You need structure, not just storage.

Vektori solves this with a three-layer sentence graph. Agents don't just recall preferences โ€” they understand how things got there.

FACT LAYER (L0)      <- vector search surface. Short, crisp statements.
        |
EPISODE LAYER (L1)   <- patterns auto-discovered via graph traversal.
        |
SENTENCE LAYER (L2)  <- raw conversation. Sequential NEXT edges. The full story.

Three-layer memory graph: Facts โ†’ Episodes โ†’ Sentences

Search hits Facts, graph discovers Episodes, traces back to source Sentences. SQLite by default โ€” swap to Postgres, Neo4j, Qdrant, or Milvus when you're ready to scale.


Benchmarks

Tested on long-horizon memory benchmarks โ€” hundreds of turns, real user details buried deep in history.

Benchmark Vektori Mem0 Zep Supermemory Letta
LoCoMo 66% 66% 58%โ€  ~70% ~83%
LongMemEval-S 73% โ€” 64% 85% โ€”

โ€ Zep's self-reported score is 75%; independently re-evaluated at 58%. Scores across systems are not always directly comparable โ€” model choice (GPT-4o vs GPT-4.1-mini vs local) significantly affects results.

We used gemini-2.5-flash-lite because of token cost, better models imporve accuracy a lot. Benchmarks at L1 level

On LoCoMo and longmemEval, the retrieved context contains the answer in 95% of questions โ€” the gap to 66% is a synthesis problem, not a retrieval one. Actively working on closing it, exploring RL.

Still improving โ€” PRs and evals welcome. Run your own: /benchmarks


Install

pip install vektori                      # SQLite + Postgres
pip install 'vektori[neo4j]'             # + Neo4j support
pip install 'vektori[qdrant]'            # + Qdrant support
pip install 'vektori[milvus]'            # + Milvus support
pip install 'vektori[neo4j,qdrant,milvus]'  # all backends

No Docker, no external services. SQLite by default.


30-Second Quickstart

import asyncio
from vektori import Vektori

async def main():
    v = Vektori(
        embedding_model="openai:text-embedding-3-small",
        extraction_model="openai:gpt-4o-mini",
    )

    await v.add(
        messages=[
            {"role": "user", "content": "I only use WhatsApp, please don't email me."},
            {"role": "assistant", "content": "Got it, WhatsApp only."},
            {"role": "user", "content": "My outstanding amount is โ‚น45,000 and I can pay by Friday."},
        ],
        session_id="call-001",
        user_id="user-123",
    )

    results = await v.search(
        query="How does this user prefer to communicate?",
        user_id="user-123",
        depth="l1",  # facts + episodes
    )

    for fact in results["facts"]:
        print(f"[{fact['score']:.2f}] {fact['text']}")
    for episode in results["episodes"]:
        print(f"episode: {episode['text']}")

    await v.close()

asyncio.run(main())

Output:

[0.94] User prefers WhatsApp communication
[0.81] Outstanding balance of โ‚น45,000, payment expected Friday
episode: User consistently avoids email โ€” route all comms to WhatsApp

Retrieval Depths

Pick how deep you want to go.

Depth Returns ~Tokens When to use
l0 Facts only 50-200 Fast lookup, agent planning, tool calls
l1 Facts + Episodes + source Sentences 300-800 Default. Full answer with context
l2 Facts + Episodes + Sentences + ยฑN context window 1000-3000 Trajectory analysis, full story replay
# Just the facts
results = await v.search(query, user_id, depth="l0")

# Facts + episodes (recommended)
results = await v.search(query, user_id, depth="l1")

# Everything, with surrounding conversation context
results = await v.search(query, user_id, depth="l2", context_window=3)

Build an Agent with Memory

Three lines to wire memory into any agent loop:

import asyncio
from openai import AsyncOpenAI
from vektori import Vektori

client = AsyncOpenAI()

async def chat(user_id: str):
    v = Vektori(
        embedding_model="openai:text-embedding-3-small",
        extraction_model="openai:gpt-4o-mini",
    )
    session_id = f"session-{user_id}-001"
    history = []

    print("Chat with memory (type 'quit' to exit)\n")
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "quit":
            break

        # 1. Pull relevant memory
        mem = await v.search(query=user_input, user_id=user_id, depth="l1")
        facts = "\n".join(f"- {f['text']}" for f in mem.get("facts", []))
        episodes = "\n".join(f"- {ep['text']}" for ep in mem.get("episodes", []))

        # 2. Inject into system prompt
        system = "You are a helpful assistant with memory.\n"
        if facts:    system += f"\nKnown facts:\n{facts}"
        if episodes: system += f"\nBehavioral episodes:\n{episodes}"

        # 3. Get response
        history.append({"role": "user", "content": user_input})
        resp = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "system", "content": system}, *history],
        )
        reply = resp.choices[0].message.content
        history.append({"role": "assistant", "content": reply})
        print(f"Assistant: {reply}\n")

        # 4. Store exchange
        await v.add(
            messages=[{"role": "user", "content": user_input},
                      {"role": "assistant", "content": reply}],
            session_id=session_id,
            user_id=user_id,
        )

    await v.close()

asyncio.run(chat("demo-user"))

More examples in /examples:


Storage Backends

# SQLite (default) โ€” zero config, starts instantly
v = Vektori()

# PostgreSQL + pgvector โ€” production scale
v = Vektori(database_url="postgresql://localhost:5432/vektori")

# Neo4j โ€” native graph traversal for Episode layer
v = Vektori(
    storage_backend="neo4j",
    database_url="bolt://localhost:7687",
    embedding_dimension=1024,   # must match your embedding model
)

# Qdrant โ€” dedicated vector DB, cloud-ready
v = Vektori(
    storage_backend="qdrant",
    database_url="http://localhost:6333",
    embedding_dimension=1024,
)

# Qdrant Cloud
v = Vektori(
    storage_backend="qdrant",
    database_url="https://your-cluster.qdrant.io",
    qdrant_api_key="your-api-key",
    embedding_dimension=1024,
)

# Milvus โ€” high-scale vector store with partition-key isolation
v = Vektori(
    storage_backend="milvus",
    database_url="http://localhost:19530",
    embedding_dimension=1024,
)

# Milvus / Zilliz Cloud
v = Vektori(
    storage_backend="milvus",
    database_url="https://your-cluster-endpoint",
    milvus_token="your-api-key-or-token",
    embedding_dimension=1024,
)

# In-memory โ€” tests / CI
v = Vektori(storage_backend="memory")

All backends via Docker:

git clone https://github.com/vektori-ai/vektori
cd vektori
docker compose up -d                 # starts Postgres, Neo4j, Qdrant, and Milvus

# Postgres
DATABASE_URL=postgresql://vektori:vektori@localhost:5432/vektori python examples/quickstart_postgres.py

# Neo4j
VEKTORI_STORAGE_BACKEND=neo4j VEKTORI_DATABASE_URL=bolt://localhost:7687 vektori add "I prefer dark mode" --user-id u1

# Qdrant
VEKTORI_STORAGE_BACKEND=qdrant VEKTORI_DATABASE_URL=http://localhost:6333 vektori add "I prefer dark mode" --user-id u1

# Milvus
VEKTORI_STORAGE_BACKEND=milvus VEKTORI_DATABASE_URL=http://localhost:19530 vektori add "I prefer dark mode" --user-id u1

# Milvus Cloud
MILVUS_TOKEN=your-api-key VEKTORI_STORAGE_BACKEND=milvus VEKTORI_DATABASE_URL=https://your-cluster-endpoint vektori add "I prefer dark mode" --user-id u1

CLI storage flags:

vektori config --storage-backend qdrant --database-url http://localhost:6333
vektori config --storage-backend milvus --database-url http://localhost:19530
vektori add "my note" --user-id u1
vektori search "preferences" --user-id u1

Model Support

Bring whatever model stack you have. Works with 10 providers out of the box.

# OpenAI
v = Vektori(
    embedding_model="openai:text-embedding-3-small",
    extraction_model="openai:gpt-4o-mini",
)

# Azure OpenAI
# Ensure AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY are set
# Note: The string after "azure:" must match your specific Azure deployment names
v = Vektori(
    embedding_model="azure:my-embedding-deployment",
    extraction_model="azure:my-gpt-4o-deployment",
)

# GitHub Models (Copilot)
# Requires GITHUB_TOKEN. You can get one by running `./scripts/get_github_token.sh`
v = Vektori(
    embedding_model="github:text-embedding-3-small",
    extraction_model="github:gpt-4o",
)

# Anthropic
v = Vektori(
    embedding_model="anthropic:voyage-3",
    extraction_model="anthropic:claude-haiku-4-5-20251001",
)

# Fully local, no API keys, no internet
v = Vektori(
    embedding_model="ollama:nomic-embed-text",
    extraction_model="ollama:llama3",
)

# Sentence Transformers (local, no Ollama required)
v = Vektori(embedding_model="sentence-transformers:all-MiniLM-L6-v2")

# BGE-M3 โ€” multilingual, 1024-dim, best local embeddings we've found
v = Vektori(embedding_model="bge:BAAI/bge-m3")

# LiteLLM โ€” 100+ providers through one interface
v = Vektori(extraction_model="litellm:groq/llama3-8b-8192")

NVIDIA NIM - GPU-optimized models via NVIDIA NIM.

# NVIDIA embedding models (Matryoshka: 384-2048 dimensions)
v = Vektori(
    embedding_model="nvidia:llama-nemotron-embed-1b-v2",
    embedding_dimension=1024,  # Optional: 384, 512, 768, 1024, or 2048
)

# NVIDIA LLM models (nvidia/ prefix auto-added)
v = Vektori(extraction_model="nvidia:llama-3.3-nemotron-super-49b-v1")

# Third-party models hosted on NVIDIA NIM (use full path)
v = Vektori(extraction_model="nvidia:z-ai/glm5")

Contributing

Vektori is early and there's a lot of ground to cover. If you're building agents that need memory, your real-world feedback is the most valuable thing you can contribute.

git clone https://github.com/vektori-ai/vektori
cd vektori
pip install -e ".[dev]"
pytest tests/unit/

Star History

Star History Chart


License

Apache 2.0. See LICENSE.

Release History

VersionChangesUrgencyDate
main@2026-05-26Latest activity on main branchHigh5/26/2026
v0.1.1Latest release: v0.1.1High4/8/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

txtai๐Ÿ’ก All-in-one AI framework for semantic search, LLM orchestration and language model workflowsv9.10.0
PageIndex๐Ÿ“‘ PageIndex: Document Index for Vectorless, Reasoning-based RAGmain@2026-06-02
uniAISyllabus-aware RAG study assistant for university students. Answers strictly from your own notes & PDFs, unit-scoped retrieval, cross-encoder reranking, and a hallucination gate โ€” built to help studen0.0.0
Awesome-RAG-Production๐Ÿš€ Build and scale reliable Retrieval-Augmented Generation (RAG) systems with this curated collection of tools, frameworks, and best practices.main@2026-06-07
ai-real-estate-assistantAdvanced AI Real Estate Assistant using RAG, LLMs, and Python. Features market analysis, property valuation, and intelligent search.v5.0.7

More in Databases

orbitOne API for 20+ LLM providers, your databases, and your files โ€” self-hosted, open-source AI gateway with RAG, voice, and guardrails.
alibabacloud-adb20211201Alibaba Cloud adb (20211201) SDK Library for Python
milvusMilvus is a high-performance, cloud-native vector database built for scalable vector ANN search
qdrantQdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/