Home > MCP Servers > OpenContracts

OpenContracts

Humans and AI agents, building knowledge bases together. Self-hosted document annotation, version control, semantic search, and MCP.

agent agentic-ai etl etl-pipeline llm prompt-engineering python unstructured-data vector-database

Why this rank:Strong adoptionHealthy release cadenceRelease freshness

Description

Humans and AI agents, building knowledge bases together. Self-hosted document annotation, version control, semantic search, and MCP.

README

OpenContracts (Demo)

The open source platform for building knowledge bases that humans and AI agents can work with together.


Backend coverage
Frontend coverage
Meta

Most knowledge lives in documents. Contracts, regulations, research papers, policies — the stuff that governs how organizations actually work. That knowledge is usually trapped: locked in PDFs, scattered across drives, understood fully by a handful of people who happened to read the right things at the right time.

OpenContracts started in 2019 with a simple conviction: that knowledge needed to be carefully curated, and that machine learning systems were only as good as the data underneath them. It was built as a platform for human collaborators — lawyers, researchers, analysts — to annotate documents together and produce gold-standard training data.

Those collaborators mostly never came. The platform was too early, the problem too niche, the value too invisible.

Then large language models arrived, and the world suddenly needed exactly what OpenContracts had been building all along: structured, annotated, version-controlled knowledge bases that AI could actually reason over. The collaborators the platform was designed for finally showed up — they just turned out to be AI agents.

Today, OpenContracts is a self-hosted platform where teams build knowledge bases from their documents and where AI agents work alongside humans to search, analyze, and extend that knowledge. The core conviction hasn't changed. The best AI systems still need carefully curated data. The difference is that now, the curation and the AI happen in the same place.

AI Agents Configurable assistants that search, annotate, and reason over your knowledge base	MCP Server Expose your corpus to Claude, Cursor, and any MCP-compatible AI tool	Multimodal Search Vector embeddings and full-text search across documents and annotations
Collaboration Threaded discussions, @mentions, voting, and moderation at every level	Data Extract Structured extraction across hundreds of documents with LLM-powered queries	Format Preservation PDF layout fidelity with precise text-to-coordinate mapping via PAWLS

What Makes This Different

Human Knowledge as the Foundation

This is not another "chat with your PDFs" tool. OpenContracts treats human annotation as the ground truth. Teams define custom label schemas, annotate documents with precise selections (including multi-page spans), and map relationships between concepts. AI builds on top of that work — it doesn't replace it.

Knowledge Bases, Not File Cabinets

Documents are organized into corpuses — version-controlled collections with folder hierarchies, fine-grained permissions, and full history. Fork a public corpus to build on someone else's annotations. Restore any previous version. Every change is tracked.

This is git for knowledge: you can branch, build, share, and never lose work.

AI Agents That Work With What You've Built

Configurable AI agents can search your documents, query your annotations, and participate in discussions — all grounded in the structured knowledge your team has created. They don't hallucinate in a vacuum; they reason over real, curated data.

@mention an agent in a discussion thread. Ask it to compare clauses across a hundred contracts. Let it surface patterns your team annotated last quarter. The agent's power comes from the quality of the knowledge base underneath it.

Collaboration Where the Knowledge Lives

Forum-style threaded discussions at every level — global, per-corpus, per-document. @mention documents, corpuses, and AI agents. Upvote the best analysis. Pin critical findings. The conversation happens next to the source material, not in a separate tool.

Shared Knowledge Compounds

Make a corpus public. Others fork it, refine the annotations, add documents, and share their improvements. Leaderboards and badges recognize contributors. Analytics show which knowledge bases are gaining traction and where the community is most active.

This is the DRY principle applied to institutional knowledge: annotate once, build on it forever.

See it in Action

PDF Annotation Flow

Text Format Support

Quick Start

Development

git clone https://github.com/Open-Source-Legal/OpenContracts.git
cd OpenContracts

# Copy sample environment files
mkdir -p .envs/.local
cp ./docs/sample_env_files/backend/local/.django ./.envs/.local/.django
cp ./docs/sample_env_files/backend/local/.postgres ./.envs/.local/.postgres
cp ./docs/sample_env_files/frontend/local/django.auth.env ./.envs/.local/.frontend

# Build and start all services (including frontend)
docker compose -f local.yml build
docker compose -f local.yml --profile fullstack up

Then open http://localhost:3000 and log in with admin / Openc0ntracts_def@ult.

See the full Quick Start guide for details and troubleshooting.

Production

# Apply database migrations first
docker compose -f production.yml --profile migrate up migrate

# Start services
docker compose -f production.yml up -d

Documentation

Browse the full documentation at jsv4.github.io/OpenContracts or in the repo:

Guide	Description
Quick Start	Get running with Docker in minutes
Key Concepts	Core workflows and terminology
PDF Data Format	How text maps to PDF coordinates
LLM Framework	PydanticAI integration and agents
Vector Stores	Semantic search architecture
Pipeline Overview	Parser and embedder system
Custom Extractors	Build your own data extraction tasks
v3.0.0.b3 Release Notes	Latest features and migration guide

Architecture

Data Format

OpenContracts uses a standardized format for representing text and layout on PDF pages, enabling portable annotations across tools:

Processing Pipeline

The modular pipeline supports custom parsers, embedders, and thumbnail generators:

Each component inherits from a base class with a defined interface:

Parsers — Extract text and structure from documents
Embedders — Generate vector embeddings for search
Thumbnailers — Create document previews

See the pipeline documentation for details on creating custom components.

Telemetry

OpenContracts collects anonymous usage data to guide development priorities: installation events, feature usage statistics, and aggregate counts. We do not collect document contents, extracted data, user identities, or query contents.

Disable backend telemetry: Set TELEMETRY_ENABLED=False in your Django settings. Disable frontend analytics: Leave REACT_APP_POSTHOG_API_KEY unset in frontend/public/env-config.js.

Supported Formats

PDF (full layout and annotation support)
Text-based formats (plaintext, Markdown)

Coming soon: DOCX viewing and annotation powered by Docxodus.

Acknowledgements

This project builds on work from:

AllenAI PAWLS — PDF annotation data format and concepts
NLMatics nlm-ingestor — Document parsing pipeline

License

AGPL-3.0 — See LICENSE for details.

Release History

Version	Changes	Urgency	Date
v3.0.0.b4	# OpenContracts v3.0.0.b4 ## Highlights This beta release includes significant new features, security hardening, performance optimizations, and UI modernization across the platform. ### Auth0 Authentication for Django Admin - Django admin now supports Auth0 SSO login with password fallback - Admin claims synchronization via Auth0 token claims (`is_staff`, `is_superuser`) - Open redirect prevention, CSRF protection, and in-memory token storage - 50+ security tests covering edge cases	Low	2/8/2026
v3.0.0.b3	## 🎯 Summary v3.0.0.b3 transforms OpenContracts from a document analysis platform into a collaborative document intelligence hub. This is our largest release ever, introducing social features, AI agents, and a complete versioning system. --- ## ✨ New Features ### 📄 Document Versioning - Version History Panel - Track changes, view metadata, restore previous versions - Time Travel - Query corpus state at any point in history - Soft Delete & Restore - Deleted docum	Low	12/12/2025
v3.0.0.b2	# Description This release brings important mobile improvements, new agent capabilities, and enhanced export functionality ## Major Features & Improvements 1. Modular Agent Instructions - You can now customize agent system prompts per corpus, giving you fine-grained control over agent behavior for different document collections (#521) 2. Enhanced Export Modal - Added pagination and delete functionality to the export modal for better management of export tasks (#507) 3.	Low	10/26/2025
v3.0.0.b1	# Description This release brings substantial but targeted improvements to the annotator UI/UX, specifically 1. Vastly improved UI/UX for navigating notes, annotations and other document info in the view. Work there is still ongoing, but the many tabs and requirements to context switch have been replaced in favor of a single unified context feed where you can sort and filter the various info container types by page and - eventually - we can scroll lock the feed and the document. 2. A	Low	8/25/2025
v3.0.0.a2	Another big update: 1. Supports dynamic length embeddings (per corpus) so you can configure a different embeddings module for different projects. 2. Async and websocket support with improved agents (more work to be done) 3. Source highlights IN document for queries 4. Migrated to vite 5. Migrated to pdf,js 5.* 6. Playwright tests for key pdf interactions (more to come) ## What's Changed * [Snyk] Fix for 32 vulnerabilities by @JSv4 in https://github.com/JSv4/OpenContracts/pull/291	Low	5/6/2025
v3.0.0.a1	3.0.0 Alpha1 Release: This release brings a ton of long-planned and much-needed improvements. Specifically: 1. Brought the frontend up to React 18 2. Completely overhauled state handling in the annotator component to improve performance and cut down on unnecessary re-renders. Using Jotai atoms now instead of contexts. 3. Added modular document processing pipelines that can easily be configured and enabled/disabled via settings module. 4. Added a docling-based processing pipeline	Low	1/6/2025
v2.4.0	This is a pretty significant upgrade vs 2.3.1. We added a number of features: 1. We now support ingesting, rendering and annotating txt-based formats like plaintext, markdown, etc. 2. Our document ingestion pipeline has a parser for txt-based formats. 3. The task decorator for custom tasks will automatically switch from span-based to token-based annotations depending on the underlying format. At the moment this is just pdf vs non-pdf, but could be a richer taxonomy. 4. Substantial styli	Low	11/11/2024
v2.3.1	Two primary improvements in this release: 1. The admin views have been built out with more filters, raw_id renders (to cut down on M2M and FK pulls), and custom actions - including a custom dropdown action on selected Corpus(es) to make them public. 2. We were previously loading ALL annotations for an analysis in each document view. First off, that's really inefficient for large corpuses. Second, it meant that the annotator got cluttered with random annotations that weren't actually in	Low	9/20/2024
v.2.3.0	It is now possible to collect feedback from users on public corpuses where `can_comment` is set to true. Added some nice GUI enhancements to the labels to support more action buttons - including a cool parabolic spiral button cloud that sprouts from an action zone. ## What's Changed * Add User Feedback by @JSv4 in https://github.com/JSv4/OpenContracts/pull/216 Full Changelog: https://github.com/JSv4/OpenContracts/compare/v2.2.0...v.2.3.0	Low	9/17/2024
v2.2.0	This release brings an enormous number of frontend improvements and tweaks, primarily focused on unifying the document annotation and viewer components into a single component that has a single, clean workflow for viewing different extracts and analyses for a given document. ## What's Changed * Finalize 2.1 by @JSv4 in https://github.com/JSv4/OpenContracts/pull/200 * Bump crispy-bootstrap5 from 0.7 to 2024.2 by @dependabot in https://github.com/JSv4/OpenContracts/pull/196 * Bump redis fro	Low	9/12/2024
v2.1.0	## TLDR This release brings the addition of `CorpusActions`, GitHub Action-style automatic analyzers or data extractors that run when a document is uploaded. See more [here](https://jsv4.github.io/OpenContracts/architecture/opencontract-corpus-actions/). ## What's Changed * Upgrade Django App Dependencies to work with Django LTS by @JSv4 in https://github.com/JSv4/OpenContracts/pull/172 * Add Document Analysis Row by @JSv4 in https://github.com/JSv4/OpenContracts/pull/175 * Bump django	Low	8/27/2024
v2.0.0.post1	# Upgrade Dependencies The upgrade from Django 3.2* to 4.2.* introduced a syntax change in the management command that caused two django app dependencies to break. In the process of upgrading these, some other dependency issues cropped up. This release: 1. Upgrades django app dependencies for full Django 4.2.* compatibility 2. Upgrades opencv and related dependencies 3. Introduces additional test cases to improve test coverage. ## What's Changed * Upgrade Django App Dependencies	Low	7/30/2024
v2.0.0	# This release includes: 1. A table-based data extract interface and related models 2. Improved test coverage 3. Upgrade to Django 4.2.* LTS ## What's Changed * Add Data Extraction by @JSv4 in https://github.com/JSv4/OpenContracts/pull/117 * Bump pytest from 6.2.5 to 8.2.2 by @dependabot in https://github.com/JSv4/OpenContracts/pull/126 * v2 Bugfixes by @JSv4 in https://github.com/JSv4/OpenContracts/pull/128 * Bump actions/upload-artifact from 3 to 4 by @dependabot in https://github.	Low	7/27/2024
v2.0.0.b3	# Some PDF-handling-related improvements: 1. Merged some nlm-ingestor changes from upstream repo to fix an issue with missing style tags with certain pdfs 2. Improved test coverage for pdf utils 3. Turn on OCR dynamically for PDFs that appear to need it, avoiding wasting processing power on all PDFs while preventing text-less PDFs when OCR is required. # Also some minor GUI bug-fixes	Low	7/22/2024
v2.0.0.b2	Features: - The data extract tasks are now dynamically loaded and can be applied on a column-by-column basis. So, you can write very specific extract logic for a given column / data field. Newly-registered tasks are displayed automatically on the frontend and can be selected by the user when building a fieldset for a datagrid. - Add a search to the Extracts view and improved various load and performance issues. - Removed the LanguageModel model as it's almost completely subsumed by the abi	Low	6/23/2024
v2.0.0b1	# 2.0.0 Beta 1 Added Grid-based Data Extraction and Corpus Querying This update extends the analytical capabilities of the application, allowing for automated and background extraction of structured data from documents, improving efficiency and scalability. ### We've added a couple models on the backend: Extract: Represents a headless, background annotation task linked to a Corpus and Fieldset. Fieldset: Defines a reusable set of fields for Extracts, linked to Columns. **Co	Low	6/19/2024
v1.3.0	Major feature is addition of nlm ingestor microservice which will eventually totally replace the PAWLs preprocessor (which has some periodic issues for certain doc types). This allows us to import layout blocks along with the document and token layers. ## What's Changed * Add Documentation on Annotation Creation Logic + Component(s) by @JSv4 in https://github.com/JSv4/OpenContracts/pull/113 * Create overview.md by @JSv4 in https://github.com/JSv4/OpenContracts/pull/114 * Add Nlm-ingestor	Low	6/4/2024
v1.2.2	I moved the PAWLs parser to its own repo and am now pointing my dependency there. I also noticed that I had made some changes beyond bug fixes in my work to improve outputs where PDF image quality is bad. While this did improve the results, I inadvertently introduced a scaling issue with the token coordinate system, and the tokens were offset from the image, so labeling was effectively broken. I rolled back the OCR quality workarounds I added to fix the scaling issue in my new repo. These can be	Low	9/13/2023
v1.2.1	Created a new format that encapsulates a document's pdf, its text, its PAWLs tokens and all annotations which can be imported in a single API call. This will be useful for remote clients that might process a document and then want to upload multiple annotations simultaneously. Will also support planned feature to export single annotated documents in addition to entire corpuses. ## What's Changed * Added import task to import a single annotated doc. Also added a test. by @JSv4 in https://gith	Low	5/13/2023
v1.2.0	The main feature addition here is the ability to export documents into FUNSD-style annotations that can easily be loaded into LayoutLM-style models. There is also a LangChain export, but it's not fully-baked yet . At the moment, it just exports full document text and metadata. This release also comes with a number of bug fixes. ## What's Changed * Fix Quickstart Docs by @JSv4 in https://github.com/JSv4/OpenContracts/pull/84 * Fix Django Auth by @JSv4 in https://github.com/JSv4/OpenContracts	Low	3/10/2023
v1.1.0	Initial release of a version of OpenContracts that supports "metadata" annotations - essentially data fields the user (or API) can populate. Long-term, it'd be great to support multiple data types, but, for now, this is just string data. I've also rebuilt the document processing pipeline for higher performance and more robust handling of extreme variations in document sizes. Every document is split into single pages and then the pages are added to a queue for processing. I do need to add some do	Low	2/28/2023
v1.0.1	New Features: This release adds an API Token Authorization mechanism so you can more easily integrate OpenContracts into backend services and infrastructure. Chores: A number of packages have been upgraded. See below. ## What's Changed * Updated codecov badge. by @JSv4 in https://github.com/JSv4/OpenContracts/pull/10 * Added frontend .env file samples and guidance. by @JSv4 in https://github.com/JSv4/OpenContracts/pull/11 * Bump actions/checkout from 3.0.2 to 3.1.0 by @dep	Low	11/20/2022
v1.0.0	Initial public release, with sample deployments including Gremlin Analyzers.	Low	10/24/2022

Dependencies & License Audit

Loading dependencies...

Similar Packages

Cognio🧠 Enhance AI conversations with Cognio, a persistent memory server that retains context and enables meaningful semantic search across sessions.main@2026-06-05

Ollama-Terminal-AgentAutomate shell tasks using a local Ollama model that plans, executes, and fixes commands without cloud or API dependencies.main@2026-06-04

mcp-audit🌟 Track token consumption in real-time with MCP Audit. Diagnose context bloat and unexpected spikes across MCP servers and tools efficiently.main@2026-06-06

agentscopeBuild and run agents you can see, understand and trust.v2.0.1

local-rag-system🤖 Build your own local Retrieval-Augmented Generation system for private, offline AI memory without ongoing costs or data privacy concerns.main@2026-06-05

More in MCP Servers

AstrBotAgentic IM Chatbot infrastructure that integrates lots of IM platforms, LLMs, plugins and AI feature, and can be your openclaw alternative. ✨

agentscopeBuild and run agents you can see, understand and trust.

claude-plugins-officialOfficial, Anthropic-managed directory of high quality Claude Code Plugins.

langchain4jLangChain4j is an open-source Java library that simplifies the integration of LLMs into Java applications through a unified API, providing access to popular LLMs and vector databases. It makes impleme