freshcrate
Home > MCP Servers > qdrant-loader

qdrant-loader

Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligen

Description

Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligent file conversion (PDF/Office/images), and semantic search. Includes MCP server for seamless AI assistant integration.

README

QDrant Loader

PyPI - qdrant-loader PyPI - mcp-server PyPI - qdrant-loader-core CodeRabbit Pull Request Reviews Test Coverage License: GPL v3

📝 Changelog v1.0.0 - Latest improvements and bug fixes

A comprehensive toolkit for loading data into Qdrant vector database with advanced MCP server support for AI-powered development workflows.

🎯 What is QDrant Loader?

QDrant Loader is a data ingestion and retrieval system that collects content from multiple sources, processes and vectorizes it, then provides intelligent search capabilities through a Model Context Protocol (MCP) server for AI development tools.

Perfect for:

  • 🤖 AI-powered development with Cursor, Windsurf, and other MCP-compatible tools
  • 📚 Knowledge base creation from technical documentation
  • 🔍 Intelligent code assistance with contextual information
  • 🏢 Enterprise content integration from multiple data sources

📦 Packages

This monorepo contains three complementary packages:

Data ingestion and processing engine

Collects and vectorizes content from multiple sources into QDrant vector database.

Key Features:

  • Multi-source connectors: Git, Confluence (Cloud & Data Center), JIRA (Cloud & Data Center), Public Docs, Local Files
  • File conversion: PDF, Office docs (Word, Excel, PowerPoint), images, audio, EPUB, ZIP, and more using MarkItDown
  • Smart chunking: Modular chunking strategies with intelligent document processing and hierarchical context
  • Incremental updates: Change detection and efficient synchronization
  • Multi-project support: Organize sources into projects with shared collections
  • Provider-agnostic LLM: OpenAI, Azure OpenAI, Ollama, and custom endpoints with unified configuration

Core library and LLM abstraction layer

Provides the foundational components and provider-agnostic LLM interface used by other packages.

Key Features:

  • LLM Provider Abstraction: Unified interface for OpenAI, Azure OpenAI, Ollama, and custom endpoints
  • Configuration Management: Centralized settings and validation for LLM providers
  • Rate Limiting: Built-in rate limiting and request management
  • Error Handling: Robust error handling and retry mechanisms
  • Logging: Structured logging with configurable levels

AI development integration layer

Model Context Protocol server providing search capabilities to AI development tools.

Key Features:

  • MCP Protocol 2025-06-18: Latest protocol compliance with dual transport support (stdio + HTTP)
  • Advanced search tools: Semantic search, hierarchy-aware search, attachment discovery, and conflict detection
  • Cross-document intelligence: Document similarity, clustering, relationship analysis, and knowledge graphs
  • Streaming capabilities: Server-Sent Events (SSE) for real-time search results
  • Production-ready: HTTP transport with security, session management, and health checks

🚀 Quick Start

Installation

# Install both packages
pip install qdrant-loader qdrant-loader-mcp-server

# Or install individually
pip install qdrant-loader          # Data ingestion only
pip install qdrant-loader-mcp-server  # MCP server only

5-Minute Setup

  1. Create a workspace

    mkdir my-workspace && cd my-workspace
  2. Initialize workspace with templates

    qdrant-loader init --workspace .
  3. Configure your environment (edit .env)

    # Qdrant connection
    QDRANT_URL=http://localhost:6333
    QDRANT_COLLECTION_NAME=my_docs
    
    # LLM provider (new unified configuration)
    OPENAI_API_KEY=your_openai_key
    LLM_PROVIDER=openai
    LLM_BASE_URL=https://api.openai.com/v1
    LLM_EMBEDDING_MODEL=text-embedding-3-small
    LLM_CHAT_MODEL=gpt-4o-mini
  4. Configure data sources (edit config.yaml)

    global:
      qdrant:
        url: "http://localhost:6333"
        collection_name: "my_docs"
      llm:
        provider: "openai"
        base_url: "https://api.openai.com/v1"
        api_key: "${OPENAI_API_KEY}"
        models:
          embeddings: "text-embedding-3-small"
          chat: "gpt-4o-mini"
        embeddings:
          vector_size: 1536
    
    projects:
      my-project:
        project_id: "my-project"
        sources:
          git:
            docs-repo:
              base_url: "https://github.com/your-org/your-repo.git"
              branch: "main"
              file_types: ["*.md", "*.rst"]
  5. Load your data

    qdrant-loader ingest --workspace .
  6. Start the MCP server

    mcp-qdrant-loader --env /path/tp/your/.env

🔧 Integration with Cursor

Add to your Cursor settings (.cursor/mcp.json):

{
  "mcpServers": {
    "qdrant-loader": {
      "command": "/path/to/venv/bin/mcp-qdrant-loader",
      "env": {
        "QDRANT_URL": "http://localhost:6333",
        "QDRANT_COLLECTION_NAME": "my_docs",
        "OPENAI_API_KEY": "your_key"
      }
    }
  }
}

Alternative: Use configuration file (recommended for complex setups):

{
  "mcpServers": {
    "qdrant-loader": {
      "command": "/path/to/venv/bin/mcp-qdrant-loader",
      "args": [
        "--config",
        "/path/to/your/config.yaml",
        "--env",
        "/path/to/your/.env"
      ]
    }
  }
}

Example queries in Cursor:

  • "Find documentation about authentication in our API"
  • "Show me examples of error handling patterns"
  • "What are the deployment requirements for this service?"
  • "Find all attachments related to database schema"

📚 Documentation

Getting Started

User Guides

⚠️ Migration Guide (v0.7.1+)

LLM Configuration Migration Required

  • New unified configuration: global.llm.* replaces legacy global.embedding.* and file_conversion.markitdown.*
  • Provider-agnostic: Now supports OpenAI, Azure OpenAI, Ollama, and custom endpoints
  • Legacy support: Old configuration still works but shows deprecation warnings
  • Action required: Update your config.yaml to use the new syntax (see examples above)

Migration Resources

Developer Resources

🤝 Contributing

We welcome contributions! See our Contributing Guide for:

  • Development environment setup
  • Code style and standards
  • Pull request process

Quick Development Setup

# Clone and setup
git clone https://github.com/martin-papy/qdrant-loader.git
cd qdrant-loader

# Sync workspace environment (recommended)
uv sync --all-packages --all-extras

# Add a new dependency during development
uv add fastapi
uv sync

📄 License

This project is licensed under the GNU GPLv3 - see the LICENSE file for details.


Ready to get started? Check out our Quick Start Guide or browse the complete documentation.

Release History

VersionChangesUrgencyDate
qdrant-loader-v1.0.0### Added #### Qdrant-loader - Contextual embeddings for enriched chunk context during ingestion [#221] ### Fixed #### Qdrant-loader - Jira Cloud connection failure due to deprecated search API endpoints [#215] - Duplicate chunks for Python files by rewriting AST parser [#217] - Duplicate document IDs causing missing chunks in metric tracking [#222] - Ingestion metrics: aligned size metrics and aggregated project results [#222] - JQL injection and query breaking when configuration values inHigh4/14/2026
qdrant-loader-v0.9.0### Added #### Qdrant-loader - `enable_semantic_analysis` global NLP kill switch in `chunking` config to skip spaCy/LDA processing entirely for faster ingestion [#189] - `enable_enhanced_semantic_analysis` opt-in flag in `chunking` config to gate advanced NLP fields (`pos_tags`, `dependencies`, `document_similarity`) [#195] #### Qdrant-loader-mcp-server - `expand_chunk_context` MCP tool to retrieve surrounding chunks for richer context around a specific chunk [#185] - `cluster_session_id` reMedium3/27/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

YAML-Multi-Agent-Orchestrator🤖 Define and execute multi-agent AI workflows declaratively using YAML, simplifying orchestration and enhancing collaboration through automatic context handling.main@2026-04-21
mnemos-mcp🧠 Transform documentation chaos into a structured memory system with Mnemos, your self-hosted, multi-context knowledge server for developers.main@2026-04-21
git-notes-memory🧠 Store and search your notes effectively with Git-native memory storage, enhancing productivity for Claude Code users.main@2026-04-21
a-mem-mcp-server🧠 Enhance LLM agents with an agentic memory system, featuring automatic note construction, dynamic memory updates, and intelligent semantic retrieval.main@2026-04-21
Code2MCP🚀 Transform existing codebases into MCP services with ease using Code2MCP's intelligent automation and minimal intrusion design.main@2026-04-21