freshcrate
Skin:/
Home > Databases > seekdb

seekdb

The AI-Native Search Database. Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

The AI-Native Search Database. Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.

README

็คบๆ„ๅ›พ

๐Ÿ”ท The AI-Native Search Database

Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.


๐Ÿš€ What is OceanBase seekdb?

OceanBase seekdb is an AI-native search database that unifies relational, vector, text, JSON and GIS in a single engine, enabling hybrid search and in-database AI workflows.


๐Ÿ”ฅ Why OceanBase seekdb?

Feature seekdb OceanBase Chroma Milvus MySQLย 9.0 PostgreSQL
+pgvector
DuckDB Elasticsearch
Embedded โœ… โŒ โœ… โœ… โŒ[1] โŒ โœ… โŒ
Single-Node โœ… โœ… โœ… โœ… โœ… โœ… โœ… โœ…
Distributed โŒ โœ… โŒ โœ… โŒ โŒ โŒ โœ…
MySQLย Compatible โœ… โœ… โŒ โŒ โœ… โŒ โœ… โŒ
Vectorย Search โœ… โœ… โœ… โœ… โŒ โœ… โœ… โœ…
Full-Textย Search โœ… โœ… โœ… โš ๏ธ โœ… โœ… โœ… โœ…
Hybridย Search โœ… โœ… โœ… โœ… โŒ โš ๏ธ โŒ โœ…
OLTP โœ… โœ… โŒ โŒ โœ… โœ… โŒ โŒ
OLAP โœ… โœ… โŒ โŒ โŒ โœ… โœ… โš ๏ธ
License Apache 2.0 MulanPubL 2.0 Apache 2.0 Apache 2.0 GPL 2.0 PostgreSQL License MIT AGPLv3
+SSPLv1
+Elastic 2.0

[1] Embedded capability is removed in MySQL 8.0

  • โœ… Supported
  • โŒ Not Supported
  • โš ๏ธ Limited

โœจ Key Features

Build fast + Hybrid search + Multi model

  1. Build fast: From prototype to production in minutes: create AI apps using Python, run VectorDBBench on 1C2G.
  2. Hybrid Search: Combine vector search, full-text search and relational query in a single statement.
  3. Multi-Model: Support relational, vector, text, JSON and GIS in a single engine.

AI inside + SQL inside

  1. AI Inside: Run embedding, reranking, LLM inference and prompt management inside the database, supporting a complete document-in/data-out RAG workflow.
  2. SQL Inside: Powered by the proven OceanBase engine, delivering real-time writes and queries with full ACID compliance, and seamless MySQL ecosystem compatibility.

๐ŸŽฌ Quick Start

Installation

Choose your platform:

๐Ÿ Python (Recommended for AI/ML)
pip install -U pyseekdb
๐Ÿณ Docker (Quick Testing)
docker run -d \
  --name seekdb \
  -p 2881:2881 \
  -p 2886:2886 \
  -v ./data:/var/lib/oceanbase \
  oceanbase/seekdb:latest

Please refer to the document of this docker image for details.

๐Ÿ“ฆ Binary (Standalone)
# Linux
rpm -ivh seekdb-1.x.x.x-xxxxxxx.el8.x86_64.rpm

Please replace the version number with the actual RPM package version.

๐ŸŽฏ AI Search Example

Build a semantic search system in 5 minutes:

๐Ÿ—„๏ธ ๐Ÿ Python SDK
# install sdk first
pip install -U pyseekdb
"""
this example demonstrates the most common operations with embedding functions:
1. Create a client connection
2. Create a collection with embedding function
3. Add data using documents (embeddings auto-generated)
4. Query using query texts (embeddings auto-generated)
5. Print query results

This is a minimal example to get you started quickly with embedding functions.
"""

import pyseekdb
from pyseekdb import DefaultEmbeddingFunction

# ==================== Step 1: Create Client Connection ====================
# You can use embedded mode, server mode, or OceanBase mode
# For this example, we'll use server mode (you can change to embedded or OceanBase)

# Embedded mode (local SeekDB)
client = pyseekdb.Client(
    path="./seekdb.db",
    database="test"
)
# Alternative: Server mode (connecting to remote SeekDB server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     database="test",
#     user="root",
#     password=""
# )

# Alternative: Remote server mode (OceanBase Server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     tenant="test",  # OceanBase default tenant
#     database="test",
#     user="root",
#     password=""
# )

# ==================== Step 2: Create a Collection with Embedding Function ====================
# A collection is like a table that stores documents with vector embeddings
collection_name = "my_simple_collection"

# Create collection with default embedding function
# The embedding function will automatically convert documents to embeddings
collection = client.create_collection(
    name=collection_name,
    #embedding_function=DefaultEmbeddingFunction()  # Uses default model (384 dimensions)
)

print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
print(f"Embedding function: {collection.embedding_function}")

# ==================== Step 3: Add Data to Collection ====================
# With embedding function, you can add documents directly without providing embeddings
# The embedding function will automatically generate embeddings from documents

documents = [
    "Machine learning is a subset of artificial intelligence",
    "Python is a popular programming language",
    "Vector databases enable semantic search",
    "Neural networks are inspired by the human brain",
    "Natural language processing helps computers understand text"
]

ids = ["id1", "id2", "id3", "id4", "id5"]

# Add data with documents only - embeddings will be auto-generated by embedding function
collection.add(
    ids=ids,
    documents=documents,  # embeddings will be automatically generated
    metadatas=[
        {"category": "AI", "index": 0},
        {"category": "Programming", "index": 1},
        {"category": "Database", "index": 2},
        {"category": "AI", "index": 3},
        {"category": "NLP", "index": 4}
    ]
)

print(f"\nAdded {len(documents)} documents to collection")
print("Note: Embeddings were automatically generated from documents using the embedding function")

# ==================== Step 4: Query the Collection ====================
# With embedding function, you can query using text directly
# The embedding function will automatically convert query text to query vector

# Query using text - query vector will be auto-generated by embedding function
query_text = "artificial intelligence and machine learning"

results = collection.query(
    query_texts=query_text,  # Query text - will be embedded automatically
    n_results=3  # Return top 3 most similar documents
)

print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")

# ==================== Step 5: Print Query Results ====================
for i in range(len(results['ids'][0])):
    print(f"\nResult {i+1}:")
    print(f"  ID: {results['ids'][0][i]}")
    print(f"  Distance: {results['distances'][0][i]:.4f}")
    if results.get('documents'):
        print(f"  Document: {results['documents'][0][i]}")
    if results.get('metadatas'):
        print(f"  Metadata: {results['metadatas'][0][i]}")

# ==================== Step 6: Cleanup ====================
# Delete the collection
client.delete_collection(collection_name)
print(f"\nDeleted collection '{collection_name}'")

Please refer to the User Guide for more details.

๐Ÿ—„๏ธ SQL
-- Create table with vector column
CREATE TABLE articles (
            id INT PRIMARY KEY,
            title TEXT,
            content TEXT,
            embedding VECTOR(384),
            FULLTEXT INDEX idx_fts(content) WITH PARSER ik,
            VECTOR INDEX idx_vec (embedding) WITH(DISTANCE=l2, TYPE=hnsw, LIB=vsag)
        ) ORGANIZATION = HEAP;

-- Insert documents with embeddings
-- Note: Embeddings should be pre-computed using your embedding model
INSERT INTO articles (id, title, content, embedding)
VALUES
    (1, 'AI and Machine Learning', 'Artificial intelligence is transforming...', '[0.1, 0.2, ...]'),
    (2, 'Database Systems', 'Modern databases provide high performance...', '[0.3, 0.4, ...]'),
    (3, 'Vector Search', 'Vector databases enable semantic search...', '[0.5, 0.6, ...]');

-- Example: Hybrid search combining vector and full-text
-- Replace '[query_embedding]' with your actual query embedding vector
SELECT
    title,
    content,
    l2_distance(embedding, '[query_embedding]') AS vector_distance,
    MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE) AS text_score
FROM articles
WHERE MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE)
ORDER BY vector_distance APPROXIMATE
LIMIT 10;

We suggest developers use sqlalchemy to access data by SQL for python developers.

๐Ÿ“š Use Cases

๐Ÿ“– RAG & Knowledge Retrieval

Large language models are limited by their training data. RAG introduces timely and trusted external knowledge to improve answer quality and reduce hallucination. seekdb enhances search accuracy through vector search, full-text search, hybrid search, built-in AI functions, and efficient indexing, while multi-level access control safeguards data privacy across heterogeneous knowledge sources.

  1. Enterprise QA
  2. Customer support
  3. Industry insights
  4. Personal knowledge
๐Ÿ” Semantic Search Engine

Traditional keyword search struggles to capture intent. Semantic search leverages embeddings and vector search to understand meaning and connect text, images, and other modalities. seekdb's hybrid search and multi-model querying deliver more precise, context-aware results across complex search scenarios.

  1. Product search
  2. Text-to-image
  3. Image-to-product
๐ŸŽฏ Agentic AI Applications

Agentic AI requires memory, planning, perception, and reasoning. seekdb provides a unified foundation for agents through metadata management, vector/text/mixed queries, multimodal data processing, RAG, built-in AI functions and inference, and robust privacy controlsโ€”enabling scalable, production-grade agent systems.

  1. Personal assistants
  2. Enterprise automation
  3. Vertical agents
  4. Agent platforms
๐Ÿ’ป AI-Assisted Coding & Development

AI-powered coding combines natural-language understanding and code semantic analysis to enable generation, completion, debugging, testing, and refactoring. seekdb enhances code intelligence with semantic search, multi-model storage for code and documents, isolated multi-project management, and time-travel queriesโ€”supporting both local and cloud IDE environments.

  1. IDE plugins
  2. Design-to-web
  3. Local IDEs
  4. Web IDEs
โฌ†๏ธ Enterprise Application Intelligence

AI transforms enterprise systems from passive tools into proactive collaborators. seekdb provides a unified AI-ready storage layer, fully compatible with MySQL syntax and views, and accelerates mixed workloads with parallel execution and hybrid row-column storage. Legacy applications gain intelligent capabilities with minimal migration across office, workflow, and business analytics scenarios.

  1. Document intelligence
  2. Business insights
  3. Finance systems
๐Ÿ“ฑ On-Device & Edge AI Applications

Edge devicesโ€”from mobile to vehicle and industrial terminalsโ€”operate with constrained compute and storage. seekdb's lightweight architecture supports embedded and micro-server modes, delivering full SQL, JSON, and hybrid search under low resource usage. It integrates seamlessly with OceanBase cloud services to enable unified edge-to-cloud intelligent systems.

  1. Personal assistants
  2. In-vehicle systems
  3. AI education
  4. Companion robots
  5. Healthcare devices

๐ŸŒŸ Ecosystem & Integrations


๐Ÿค Community & Support

Build from Source

Before building, please install the required toolchain and dependencies for your operating system. See Install Toolchain for detailed instructions.

# Clone the repository
git clone https://github.com/oceanbase/seekdb.git
cd seekdb
bash build.sh debug --init --make
mkdir ~/seekdb
mkdir ~/seekdb/bin
cp build_debug/src/observer/seekdb ~/seekdb/bin
cd ~/seekdb
./bin/seekdb

In this example, the working director is $HOME/seekdb, please use a fresh director for testing, Please see the Developer Guide for detailed instructions.

Contributing

We welcome contributions! See our Contributing Guide to get started.


๐Ÿ“„ License

OceanBase seekdb is licensed under the Apache License, Version 2.0.

Release History

VersionChangesUrgencyDate
v1.3.0## Version information * Release date: May 25, 2026 * Version: V1.3.0 * RPM package: seekdb-1.3.0.0-100000092026051510 ## Overview seekdb V1.3.0 is a major release for multi-platform coverage and high performance. It introduces async indexes backed by a new Change Stream incremental framework that decouples writes from index builds, delivering extreme ingest throughput and stable retrieval for AI Agent workloads. Windows and Android native build and deployment extend platform coveHigh5/25/2026
v1.2.0## Version information * Release date: March 25, 2026 * Version: V1.2.0 * RPM package: seekdb-1.2.0.0-100000222026032420 ## Overview seekdb V1.2.0 is a major milestone that takes seekdb from a single-node database toward a high-availability architecture. This release delivers primaryโ€“standby replication for stronger disaster recovery, Fork Database for whole-database versioning, and Diff & Merge syntax for Git-style branch comparison and merging. Under the hood, internal refactoriMedium3/25/2026
v1.1.0## Version information * Release date: January 30, 2026 * Version: V1.1.0 * RPM version: seekdb-1.1.0.0-100000142026013001 ## New features ### macOS build support As of V1.1.0, seekdb supports native builds and local development on macOS 15 and later. ### FORK TABLE (experimental) V1.1.0 introduces `FORK TABLE`, which lets you create a copy of a table without a full data copy. Unlike CTAS (Create Table As Select), `FORK TABLE` reuses table-level storage and uses copy-on-wLow1/30/2026
v1.0.1## Version information * Release date: December 29, 2025 * Version: V1.0.1 * RPM version: seekdb-1.0.1.0-100000392025122619 ## New features ### Vector Index Enhancements - **HNSW-BQ distance metrics**: Added support for IP (Inner Product) and cosine distance metrics for `hnsw_bq` index type. ([7aecf39c](https://github.com/oceanbase/seekdb/commit/7aecf39cc28f25a13c4c9d761f336512085c67ff)) - **Vector search limit increase**: Increased `ef_search` upper limit to 160,000. ([79111e92Low12/29/2025
v1.0.0## Version information * Release date: November 14, 2025 * Version: V1.0.0 * RPM version: seekdb-1.0.0.0-100000262025111218 For more information, please see the [seekdb v1.0.0 release notes](https://www.oceanbase.ai/docs/v1.0.0).Low11/13/2025

Dependencies & License Audit

Loading dependencies...

Similar Packages

serverMariaDB server is a community developed fork of MySQL server. Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable,mariadb-12.3.2
endeeEndee.io โ€“ A high-performance vector database, designed to handle up to 1B vectors on a single node, delivering significant performance gains through optimized indexing and execution. Also available i1.3.5
ThemisDBThemis Database System - High-performance C++ hybrid-database (graph-vector-relational-file) with AQL support and MVCCv1.8.1-rc1
qdrantQdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/v1.18.2
uni-dbUni is a modern, embedded database that combines property graph (OpenCypher), vector search, and columnar storage (Lance) into a single, cohesive engine. It is designed for applications requiring locav2.0.0

More from oceanbase

powermemPowerMem: Your AI-Powered Long-Term Memory โ€” Accurate, Agile, Affordable. Also friendly support for the OpenClaw Memory Plugin.

More in Databases

orbitOne API for 20+ LLM providers, your databases, and your files โ€” self-hosted, open-source AI gateway with RAG, voice, and guardrails.
alibabacloud-adb20211201Alibaba Cloud adb (20211201) SDK Library for Python
milvusMilvus is a high-performance, cloud-native vector database built for scalable vector ANN search
WeKnoraLLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.