AI-native document database with built-in MCP server, file upload (PDF/DOCX/HTML/ODT/RTF/TEX/YAML/Wikipedia XMLβMarkdown), vector search, RAG pipelines, and 72 MCP tools. Plugs directly into Claude, ChatGPT, Cursor, Windsurf, and any MCP-compatible agent.
MDDB is a document database purpose-built for AI agents and LLM workflows. Upload files (PDF, DOCX, HTML, ODT, RTF, TEX, YAML, TXT) β they're auto-converted to Markdown and embedded for semantic search. Expose everything to AI agents via 72 built-in MCP tools. Integrates with Docling, Langflow, OpenSearch, SSG, and wpexporter for production pipelines. Single ~29MB binary, zero configuration, BoltDB embedded storage, triple-protocol APIs (HTTP + gRPC + GraphQL).
MDDB gives your AI agents a persistent, searchable knowledge base:
- File Upload - Upload PDF, DOCX, HTML, ODT, RTF, TEX, YAML, TXT files β auto-converted to Markdown and indexed
- Wikipedia Import - Stream and import MediaWiki XML dumps (
.xml.bz2) β wikitext auto-converted to Markdown, namespace filtering, handles multi-GB files - Built-in MCP Server - 72 tools for Claude Desktop, Cursor, Windsurf, or any MCP client
- Vector Search - Auto-embed documents, semantic similarity with 7 index algorithms (Flat, HNSW, IVF, PQ, OPQ, SQ, BQ) + per-collection quantization (int8/int4) + ARM NEON/SME hardware acceleration + goroutine parallel search
- RAG-Ready - Hybrid search (BM25 + vector) for retrieval-augmented generation
- Memory RAG - Conversational memory system: store, recall, and summarize chat sessions with semantic search
- Integrations - Docling, Langflow, OpenSearch, SSG, wpexporter for production pipelines
- Zero-Shot Classification β Classify documents against candidate labels using embeddings, no training data
- Custom AI Tools - Define YAML-based MCP tools for domain-specific workflows
- Full-Text Search - Built-in inverted index with TF-IDF, BM25, BM25F, PMISparse, 7 search modes (simple, boolean, phrase, wildcard, proximity, range, fuzzy), typo tolerance, multi-language stemming (18 languages), synonyms
- Full Revision History - Every update creates a new revision with complete snapshots
- Triple Protocol APIs - HTTP/JSON (easy), gRPC (fast), or GraphQL (flexible)
- Automation - Triggers, crons, webhooks with template variables and sentiment analysis
- Real-Time Events - Server-Sent Events (SSE) for live document change notifications
- MCP Transports - Streamable HTTP (
/mcp, 2025-11-25), legacy SSE (/sse), and stdio - Built-in TLS - Native HTTPS support, connection pooling, pprof profiling
- Zero Configuration - Single ~29MB binary, embedded database, no dependencies
Perfect for: AI agent memory, RAG pipelines, knowledge bases for LLMs, documentation chatbots, semantic search APIs, document processing (PDF/DOCXβMarkdown), static site generation, WordPress migration
Start all services with one command:
git clone https://github.com/tradik/mddb.git
cd mddb
# Production mode (all services)
docker compose up -d
# Development mode (with hot reload)
make dev-start
# Development + Ollama for embeddings
make dev-start-with-ollamaServices started:
| Service | Port | Image | Description |
|---|---|---|---|
| mddbd | 11023 (HTTP), 11024 (gRPC), 9000 (MCP), 11443 (HTTP/3) | tradik/mddb:latest |
Database server with MCP built-in |
| mddb-panel | 3000 | tradik/mddb:panel |
React web admin UI |
MDDB has a built-in MCP server β no extra service needed. Add to your MCP config:
{
"mcpServers": {
"mddb": {
"command": "docker",
"args": [
"run", "-i", "--rm", "--network", "host",
"-v", "mddb-data:/app/data",
"-e", "MDDB_MCP_STDIO=true",
"tradik/mddb:latest"
]
}
}
}That's it β your AI agent now has full access to your knowledge base with 72 built-in tools (add, search, vector search, classify, and more).
β Full MCP setup guide | β MCP server config | β Custom MCP tools
# MDDB Server only
docker run -d --name mddb \
-p 11023:11023 -p 11024:11024 -p 9000:9000 \
-v mddb-data:/data \
tradik/mddb:latest
# Web Panel (connect to existing server)
docker run -d --name mddb-panel \
-p 3000:3000 \
-e VITE_MDDB_SERVER=host.docker.internal:11023 \
tradik/mddb:panel
# MCP stdio mode (for Claude Desktop, Windsurf, etc.)
docker run -i --rm --network host \
-v mddb-data:/app/data \
-e MDDB_MCP_STDIO=true \
tradik/mddb:latest
# Test it
curl http://localhost:11023/healthDocker Hub: https://hub.docker.com/r/tradik/mddb
Linux (Debian/Ubuntu):
wget https://github.com/tradik/mddb/releases/latest/download/mddbd-latest-linux-amd64.deb
sudo dpkg -i mddbd-latest-linux-amd64.deb
sudo systemctl start mddbdmacOS (Apple Silicon):
wget https://github.com/tradik/mddb/releases/latest/download/mddbd-latest-darwin-arm64.tar.gz
tar xzf mddbd-latest-darwin-arm64.tar.gz
sudo mv mddbd-latest-darwin-arm64/mddbd /usr/local/bin/
mddbdCLI Client:
# Linux
wget https://github.com/tradik/mddb/releases/latest/download/mddb-cli-latest-linux-amd64.deb
sudo dpkg -i mddb-cli-latest-linux-amd64.deb
# Usage
mddb-cli stats
mddb-cli add blog hello en_US -f post.md
mddb-cli search blog -f "tags=tutorial"
mddb-cli fts blog --query="getting started" --algorithm=bm25Other platforms: See Installation Guide
git clone https://github.com/tradik/mddb.git
cd mddb
make build
./services/mddbd/mddbdMDDB is a Go monorepo with multiple modules (services/mddbd, services/mddb-cli, tools/bench). A go.work file at the repo root enables Go workspace mode for local development:
- Cross-module refactoring β renaming a symbol in
services/mddbdimmediately updates references inservices/mddb-cliviagopls. - Unified build β
go build ./services/mddbd/... ./services/mddb-cli/... ./tools/bench/...from the repo root. - IDE "goto definition" works across module boundaries without opening each module separately.
CI runs in module-isolation mode (GOWORK=off in .github/workflows/test.yml and release.yml) so each module builds and tests independently. This catches missing require entries that workspace mode would transparently resolve from sibling modules.
To use the same mode locally for debugging:
GOWORK=off go build ./... # from inside services/mddbdRegenerating protos (buf generate) and Docker builds are unaffected by go.work β they operate on individual modules.
MDDB ships as a monorepo with multiple packages:
| Package | Language | Location | Description |
|---|---|---|---|
| mddbd | Go | services/mddbd/ |
Database server (HTTP + gRPC + GraphQL + MCP) |
| mddb-panel | React/JS | services/mddb-panel/ |
Web admin panel |
| mddb-cli | Go | services/mddb-cli/ |
Command-line client with GraphQL support |
| mddb-chat | Rust | services/mddb-chat/ |
WebSocket chat server with LLM integration |
| mddb-chat-widget | JS/TS | services/mddb-chat-widget/ |
Embeddable JS chat widget |
Zero-dependency HTTP clients - copy a single file into your project:
| Library | Language | Location | Install |
|---|---|---|---|
| PHP Extension | PHP 8.0+ | services/php-extension/mddb.php |
Copy mddb.php into your project |
| Python Extension | Python 3.8+ | services/python-extension/mddb.py |
Copy mddb.py into your project |
PHP:
require_once 'mddb.php';
$db = mddb::connect('localhost:11023', 'write');
$db->collection('blog')->add('hello', 'en_US', ['author' => ['John']], '# Hello');
$results = $db->collection('blog')->vectorSearch('cancel subscription', 5, 0.7);Python:
from mddb import MDDB
db = MDDB.connect('localhost:11023', 'write').collection('blog')
db.add('hello', 'en_US', {'author': ['John']}, '# Hello')
results = db.vector_search('cancel subscription', top_k=5)High-performance clients generated from Protocol Buffers:
| Library | Language | Location | Description |
|---|---|---|---|
| Go Client | Go | services/mddbd/proto/ |
Native Go gRPC stubs |
| Python gRPC | Python | clients/python/ |
Generated Python gRPC client |
| Node.js gRPC | Node.js | clients/nodejs/ |
Uses @grpc/grpc-js |
Proto definitions at proto/mddb.proto - generate clients for any language supported by protobuf.
Docker Images (Docker Hub)
| Image | Size | Description |
|---|---|---|
tradik/mddb:latest |
~29MB | Database server with MCP built-in (Alpine) |
tradik/mddb:panel |
~88MB | Web admin panel (Node Alpine) |
tradik/mddb:cli |
~8MB | CLI client (Alpine) |
| Format | Platform | Contents |
|---|---|---|
.deb |
Debian/Ubuntu | mddbd + systemd unit + man page |
.rpm |
RHEL/CentOS/Fedora | mddbd + systemd unit + man page |
.tar.gz |
Any (Linux, macOS, FreeBSD) | Standalone binary |
- β MCP Server - 72 built-in tools via Model Context Protocol 2025-11-25 (stdio + Streamable HTTP + SSE) with tool annotations, prompts, completion, and structured output
- β File Upload - Upload PDF, DOCX, HTML, ODT, RTF, TEX, YAML, TXT β auto-converted to Markdown (single and batch, configurable size limit)
- β
Wikipedia Import - Stream MediaWiki XML dumps (
.xml.bz2) with wikitextβMarkdown conversion, namespace filtering, batch processing - β Vector Search - Semantic similarity with auto-embeddings (OpenAI, Ollama, Cohere, Voyage), ARM NEON/SME SIMD acceleration
- β Full-Text Search - Built-in inverted index with TF-IDF, BM25, BM25F, PMISparse scoring, 7 search modes (simple, boolean, phrase, wildcard, proximity, range, fuzzy), typo tolerance, metadata pre-filtering, multi-language stemming and stop words (18 languages)
- β Hybrid Search - Sparse (BM25) + dense (vector) fusion with alpha blending or RRF
- β Aggregations - Metadata facets (value counts) and date histograms with optional pre-filtering
- β Zero-Shot Classification - Classify documents against candidate labels using embedding similarity
- β Custom MCP Tools - Define YAML-based AI tools for domain-specific workflows
- β RAG Pipeline - Built-in support for retrieval-augmented generation workflows
- β Integrations - Docling, Langflow, OpenSearch, SSG, wpexporter (guide)
- β Document Management - Full CRUD with metadata and collections
- β Revision History - Complete version control with snapshots
- β Metadata Search - Fast indexed queries with multi-value tags
- β Collection Checksum - Lightweight CRC32 checksum per collection for cache invalidation
- β Partial Document Update - Update metadata and/or content independently
- β Document TTL - Time-to-live with automatic cleanup
- β
Temporal Tracking - Document event history (create/update/access), hot-docs leaderboard, activity histograms (env
MDDB_TEMPORAL=true) - β
Spell Correction - SymSpell-based FTS spell suggestions, text cleanup, per-collection custom dictionaries (env
MDDB_SPELL=true) - β Automation - Triggers, crons, webhooks with template variables, sentiment analysis, execution logs
- β Multi-language - Same key, multiple languages
- β Schema Validation - JSON Schema validation per collection
- β Per-Collection Storage Backends - Choose BoltDB (default), in-memory (ephemeral), or S3/MinIO per collection
- β HTTP/JSON REST - Easy debugging, extensive docs
- β gRPC/Protobuf - 16x faster, 70% smaller payload
- β GraphQL - Flexible queries, schema introspection, Playground
- β CLI Client - Full-featured command-line with GraphQL support
- β Web Panel - React UI with REST/GraphQL toggle
- β Authentication - JWT tokens and API keys
- β Authorization - Collection-level RBAC (Read/Write/Admin)
- β
Per-Protocol Access Modes -
MDDB_MCP_MODE=read(MCP read-only),MDDB_API_MODE,MDDB_GRPC_MODE,MDDB_HTTP3_MODE - β
MCP Tool Control -
MDDB_MCP_BUILTIN_TOOLS=falseto expose only custom YAML tools - β User Management - Multi-user with admin roles
- β Group Permissions - Organize users into groups
- β Leader-Follower Replication - Binlog streaming for read scaling
- β Automatic Catch-up - Followers pull missing transactions
- β Zero-Downtime Snapshots - Full sync for new followers
- β Cluster Monitoring - Web panel with health and lag metrics
β See all features | β Compare with alternatives | β Performance benchmarks
MDDB supports leader-follower replication allowing you to scale read operations horizontally.
graph LR
C[Clients] -->|Writes/Reads| L[Leader]
C -->|Reads| F1[Follower 1]
C -->|Reads| F2[Follower 2]
L -->|gRPC StreamBinlog| F1
L -->|gRPC StreamBinlog| F2
- Leader: Handles writes, maintains changes in a binary log, and streams them via gRPC.
- Followers: Read-only, pulls transactions, reconnects automatically.
β Read Full Replication Guide
Modern React-based UI for managing documents, users, and search with REST/GraphQL API toggle.
Features: Browse collections, view/edit documents, vector search, user management, API mode switching (REST β GraphQL), live markdown preview.
# Upload a PDF β auto-converted to Markdown
curl -X POST http://localhost:11023/v1/upload \
-F "file=@report.pdf" \
-F "collection=docs" \
-F "lang=en_US"
# Upload with custom key and metadata
curl -X POST http://localhost:11023/v1/upload \
-F "file=@manual.docx" \
-F "collection=docs" \
-F "key=user-manual" \
-F "lang=en_US" \
-F 'meta={"category":["documentation"]}'
# Batch upload multiple files
curl -X POST http://localhost:11023/v1/upload \
-F "files[]=@doc1.pdf" \
-F "files[]=@doc2.html" \
-F "files[]=@doc3.txt" \
-F "collection=docs" \
-F "lang=en_US"# Add a document
curl -X POST http://localhost:11023/v1/add \
-H 'Content-Type: application/json' \
-d '{
"collection": "blog",
"key": "hello-world",
"lang": "en_US",
"meta": {"author": ["John"], "tags": ["tutorial"]},
"contentMd": "# Hello World\n\nWelcome to MDDB!"
}'
# Get document
curl -X POST http://localhost:11023/v1/get \
-H 'Content-Type: application/json' \
-d '{"collection": "blog", "key": "hello-world", "lang": "en_US"}'
# Search by metadata
curl -X POST http://localhost:11023/v1/search \
-H 'Content-Type: application/json' \
-d '{"collection": "blog", "filterMeta": {"tags": ["tutorial"]}, "limit": 10}'# Documents auto-embedded in background
# Search by meaning, not keywords
curl -X POST http://localhost:11023/v1/vector-search \
-H 'Content-Type: application/json' \
-d '{
"collection": "kb",
"query": "how do I cancel my subscription?",
"topK": 5,
"threshold": 0.7,
"includeContent": true
}'Combine keyword (BM25/BM25F) and semantic (vector) search in a single query. Two merge strategies:
- Alpha Blending:
combined = (1-a) * BM25_score + a * vector_score-- configurable weight - RRF (Reciprocal Rank Fusion): rank-based fusion that is robust to different score distributions
curl -X POST http://localhost:11023/v1/hybrid-search \
-H "Content-Type: application/json" \
-d '{
"collection": "docs",
"query": "machine learning",
"topK": 10,
"strategy": "alpha",
"alpha": 0.5
}'FTS supports simple, boolean, phrase, wildcard, proximity, range, and fuzzy modes with auto-detection:
# Simple search with metadata pre-filtering
curl -X POST http://localhost:11023/v1/fts \
-H "Content-Type: application/json" \
-d '{
"collection": "blog",
"query": "getting started",
"limit": 10,
"algorithm": "bm25",
"filterMeta": {"category": ["tutorial"]}
}'
# Boolean search (AND, OR, NOT, +required, -excluded)
curl -X POST http://localhost:11023/v1/fts \
-H "Content-Type: application/json" \
-d '{
"collection": "blog",
"query": "rust AND performance NOT garbage",
"mode": "boolean"
}'
# Phrase search (exact sequence)
curl -X POST http://localhost:11023/v1/fts \
-H "Content-Type: application/json" \
-d '{
"collection": "blog",
"query": "\"machine learning\"",
"mode": "phrase"
}'
# Proximity search (terms within N words)
curl -X POST http://localhost:11023/v1/fts \
-H "Content-Type: application/json" \
-d '{
"collection": "blog",
"query": "\"database performance\"~5",
"mode": "proximity",
"distance": 5
}'# Enable GraphQL
docker run -e MDDB_GRAPHQL_ENABLED=true -p 11023:11023 tradik/mddb
# Query
curl -X POST http://localhost:11023/graphql \
-H 'Content-Type: application/json' \
-d '{
"query": "{ document(collection: \"blog\", key: \"hello-world\", lang: \"en\") { contentMd meta } }"
}'
# Interactive Playground
open http://localhost:11023/playground# Install CLI
wget https://github.com/tradik/mddb/releases/latest/download/mddb-cli-latest-linux-amd64.deb
sudo dpkg -i mddb-cli-latest-linux-amd64.deb
# Use CLI
mddb-cli add blog hello en_US -f post.md -m "author=John,tags=tutorial"
mddb-cli get blog hello en_US
mddb-cli search blog -f "tags=tutorial"
mddb-cli fts blog --query="getting started"
mddb-cli statsβ More examples | β Use case examples | β Client libraries
π Official Website - Complete documentation, downloads, examples
- Quick Start Guide - 5-minute setup
- Installation Guide - All platforms (Linux, macOS, FreeBSD, Windows)
- Use Cases - Real-world examples
- HTTP/JSON API - Complete REST API reference
- gRPC API - High-performance protocol guide
- GraphQL API - Flexible query language
- OpenAPI/Swagger - Machine-readable spec
- Swagger UI - Interactive API docs
- Vector Search - Semantic search setup (OpenAI, Cohere, Voyage, Ollama)
- RAG Pipeline - Complete RAG implementation guide
- Search Algorithms - TF-IDF, BM25, BM25F, PMISparse, Flat, HNSW, IVF, PQ, SQ, BQ
- Vector Quantization - Per-collection int8/int4 scalar quantization (4-8x compression)
- Server-Sent Events - Real-time document change notifications with auth and rate limiting
- Full-Text Search - Built-in inverted index with multi-language support
- Zero-Shot Classification - Classify documents against labels using embeddings
- PMISparse - Two-phase BM25 + PPMI query expansion (invented by Tradik Limited)
- Webhooks - Event-driven integration
- Automations - Triggers, crons, webhooks, sentiment, template variables
- Temporal Tracking - Document event history, hot-docs leaderboard, activity histograms
- Spell Correction - SymSpell FTS spell suggestions, text cleanup, custom dictionaries
- Authentication - JWT & API keys, RBAC
- Web Panel - Admin UI guide
- LLM Connections - MCP for Claude, ChatGPT, Ollama, DeepSeek
- Integrations - Docling, Langflow, OpenSearch, SSG, wpexporter
- Bulk Import - Load markdown folders
- Docker Guide - Container deployment
- Deployment - Production setup
- Telemetry - Prometheus metrics, Grafana
- Health Checks - Docker & Kubernetes
- Performance - Benchmarks & tuning
- Architecture - System design
- Client Libraries - PHP, Python, Go, Node.js
- Custom MCP Tools - YAML-defined AI tools
- Examples - Code samples
- Contributing - Development guide
- Changelog - Version history
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI Agents (Claude, ChatGPT, Cursor, Windsurf) β
β β MCP (stdio / HTTP :9000) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Other Clients β
ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββββββββ€
βHTTP/JSON βgRPC/Protoβ GraphQL β HTTP/3 β
β :11023 β :11024 β /graphql β :11443 β
ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββββββββββ€
β MDDB Server (Go) β
β β’ File Upload (PDF/DOCX/HTML/TXT β Markdown) β
β β’ Auto-Embeddings (OpenAI, Ollama, Cohere, Voyage) β
β β’ Vector + Full-Text + Hybrid Search β
β β’ Zero-Shot Classification β
β β’ Automation (triggers, crons, webhooks) β
β β’ JWT Auth + RBAC β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β BoltDB (Embedded ACID Storage) β
β β’ B+Tree index β’ Single-file β’ MVCC transactions β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Contributions welcome! See CONTRIBUTING.md for guidelines.
Security issues: See SECURITY.md
BSD 3-Clause License - see LICENSE
- GitHub - Source code
- Docker Hub - Container images
- Releases - Download binaries
- Documentation - Full docs
- LLM Connections - Claude, ChatGPT, Ollama, DeepSeek, Manus, Bielik.ai
- Issues - Bug reports

