freshcrate
Home > MCP Servers > OpenDQV

OpenDQV

Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write โ€” before data enters your pipeline.

Description

Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write โ€” before data enters your pipeline.

README

OpenDQV โ€” Open Data Quality Validation

CI License: MIT Python 3.11+ PyPIDocker Platforms OpenSSF Scorecard Coverage Ruff OpenSSF Best Practices

Quickstart Rules Contracts MCP API Security FAQ

"Trust is easier to build than to repair." That is why OpenDQV exists. A 422 at the point of write is cheaper than a data incident three weeks later.

Beta (v2.x). Public API surface (REST, contract YAML, MCP tools, Python SDK) is stable. Breaking changes follow a one-release deprecation cycle. Security fixes backported to the latest 2.x line. See API Stability for commitments.

OpenDQV is a write-time data validation service. Source systems call it before writing data. Bad records return a 422 with per-field errors. Good records pass through. No payload is stored.

OpenDQV demo โ€” define a contract, send a bad record (get a 422), fix it (get a 200)

flowchart LR
    subgraph Callers
        direction TB
        SF[Salesforce]
        SAP[SAP]
        DYN[Dynamics]
        ORA[Oracle]
        WEB[Web forms]
        ETL1[ETL pipelines]

        DJ[Django clean]
        PY[Python scripts]
        PD[Pandas / ETL]

        CD[Claude Desktop]
        CUR[Cursor]
        LLM[LLM agents]
    end

    subgraph OpenDQV
        direction TB
        API[Validation API\nREST / batch]
        SDK[LocalValidator\nin-process SDK]
        MCP[MCP Server\nAI-native]
        API & SDK & MCP --> CON[Contracts ยท YAML\nGovernance ยท RBAC\nAudit trail]
        API & SDK & MCP --> GEN[Code Generator\nApex ยท JS ยท SQL]
    end

    subgraph Results
        direction TB
        R1[valid: true / false]
        R2[per-field errors]
        R3[severity levels]
        R4[webhooks on events]
    end

    SF & SAP & DYN & ORA & WEB & ETL1 --> API
    DJ & PY & PD --> SDK
    CD & CUR & LLM --> MCP

    API & SDK & MCP --> R1

    subgraph Importers
        IMP[dbt schema ยท GX suites\nSoda checks ยท ODCS ยท CSV]
    end
    IMP --> CON

    style API fill:#0d3b5e,stroke:#092a44,color:#fff
    style SDK fill:#0d3b5e,stroke:#092a44,color:#fff
    style MCP fill:#0d3b5e,stroke:#092a44,color:#fff
    style CON fill:#1a8aad,stroke:#14708d,color:#fff
    style GEN fill:#1a8aad,stroke:#14708d,color:#fff
    style R1 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style R2 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style R3 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style R4 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style IMP fill:#1a8aad,stroke:#14708d,color:#fff
Loading

A 422 at the point of write closes the feedback loop โ€” producers see failures immediately and fix them at source. Rejection rates drop over time because the tool changes the incentive, not just the outcome.

For post-landing monitoring use Great Expectations, Soda, or dbt tests โ€” they're complementary, not competing. OpenDQV owns layer one (write-time enforcement); those tools own layer three (post-ingestion observability).


AI Agents โ€” first-class via MCP

OpenDQV ships a built-in Model Context Protocol server, so Claude Desktop, Cursor, and any other MCP-compatible agent can discover contracts, validate records, and explain failures through tool calls the agent explicitly declares โ€” no hallucinated compliance, no invented rules.

OpenDQV_Marmot_MCP_Demo.mp4

4-minute demo: Claude Desktop uses two MCP servers โ€” OpenDQV for validation, Marmot for catalog lineage โ€” to check a menu item against ppds_menu_item for Natasha's Law allergen compliance, stating which tool calls it makes and why. (Backup: download the MP4 from the repo)

For tool reference, write guardrails, remote/enterprise mode, and the Marmot composition pattern, see docs/mcp.md.


Install

I have... Command
Python 3.11+ git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && bash install.sh
Docker git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && cp .env.example .env && docker compose up -d
Just the SDK/CLI pip install opendqv then opendqv init to bootstrap contracts
None of the above Beginner setup guide โ†’

install.sh creates a virtual environment, installs dependencies, and launches the onboarding wizard. Docker pulls ghcr.io/opendqv/opendqv:latest โ€” no build step required.

โš ๏ธ AUTH_MODE=open (the default) has no authentication. Set AUTH_MODE=token and a strong SECRET_KEY in .env before any non-local deployment. See SECURITY.md.


Your First Validation

1. Write a contract โ€” drop a YAML file in your contracts directory (run opendqv init --all to copy the 43 bundled contracts, or opendqv init for a single starter):

contract:
  name: order
  version: "1.0"
  owner: "Data Governance"
  status: active
  rules:
    - name: valid_email
      type: regex
      field: email
      pattern: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
      severity: error
      error_message: "Invalid email format"
    - name: amount_positive
      type: min
      field: amount
      min: 0.01
      severity: error
      error_message: "Order amount must be positive"
    - name: status_valid
      type: allowed_values
      field: status
      allowed_values: [pending, confirmed, shipped, cancelled]
      severity: error
      error_message: "Invalid order status"

2. Reload contracts:

curl -X POST http://localhost:8000/api/v1/contracts/reload

3. Send a bad record โ€” OpenDQV rejects it:

curl -s -X POST http://localhost:8000/api/v1/validate \
  -H "Content-Type: application/json" \
  -d '{"contract": "order", "record": {"email": "not-an-email", "amount": -5, "status": "unknown"}}'
{
  "valid": false,
  "errors": [
    {"field": "email",  "rule": "valid_email",    "message": "Invalid email format",        "severity": "error"},
    {"field": "amount", "rule": "amount_positive", "message": "Order amount must be positive", "severity": "error"},
    {"field": "status", "rule": "status_valid",    "message": "Invalid order status",        "severity": "error"}
  ],
  "contract": "order",
  "version": "1.0"
}

4. Fix the record โ€” it passes:

curl -s -X POST http://localhost:8000/api/v1/validate \
  -H "Content-Type: application/json" \
  -d '{"contract": "order", "record": {"email": "alice@example.com", "amount": 49.99, "status": "pending"}}'
{"valid": true, "errors": [], "warnings": [], "contract": "order", "version": "1.0"}

The customer contract ships pre-seeded if you want to skip step 1. The quickstart guide walks through authoring, lifecycle, and batch validation.


Rules

Type What it checks
not_empty Field is present and non-empty
regex Field matches (or does not match) a pattern. Built-ins: builtin:email, builtin:uuid, builtin:ipv4, builtin:url
min / max / range Numeric bounds
min_length / max_length String length
date_format Parseable date/datetime. Falls back through common formats if no explicit format is set
allowed_values Value must be in a fixed list
lookup Value must appear in a local file or HTTP endpoint (with TTL cache)
compare Cross-field: field op compare_to โ€” supports gt, lt, gte, lte, eq, neq, and today/now sentinels
required_if / forbidden_if Conditional: required or forbidden when another field equals a value
checksum Check-digit integrity: IBAN, GTIN/GS1, NHS, ISIN, LEI, VIN, CPF, ISRC
unique No duplicates within a batch (batch mode only)
cross_field_range Value must be between two other fields in the same record
field_sum Sum of named fields must equal a target (within optional tolerance)
geospatial_bounds Lat/lon pair within a bounding box
date_diff Difference between two date fields within a range
age_match Declared age consistent with date-of-birth field

Rules have severity: error (blocks the record) or severity: warning (flags but allows). Any rule can include a condition block to apply it only when another field equals a given value.

Full reference: docs/rules/


How it compares

A mature data governance programme operates across three layers, each with a distinct job:

Layer Purpose Tools
1. Write-time enforcement Prevent bad data from entering any system OpenDQV
2. Catalog / governance / stewardship Ownership, glossary, lineage, policy, stewardship workflows Alation, Atlan, Collibra, Purview, DataHub, Marmot
3. Pipeline testing / observability Detect drift, freshness issues, residual quality after ingestion Great Expectations, Soda Core, dbt tests, Monte Carlo

OpenDQV Core owns layer one. Your catalog handles layer two, your pipeline tools handle layer three.

Great Expectations / Soda / dbt OpenDQV
When After data lands (in warehouse/lake) Before data is written (at the door)
Where Data pipelines, batch jobs Source system integration points
Model Scan data at rest Validate data in flight
Latency Minutes to hours (batch) Milliseconds (API call)
Who calls it Data engineers Data engineers, developers, CRM admins

They're complementary. Use Great Expectations to monitor your warehouse. Use OpenDQV to stop bad data from getting there in the first place.


Contracts

43 production-ready contracts ship inside the opendqv package covering GDPR, HIPAA, SOX, MiFID II, UK Building Safety Act, Martyn's Law, Natasha's Law, Ofcom Online Safety Act, EU DORA, and 20+ other regulatory frameworks across UK, EU, and US. pip install opendqv gives you all of them โ€” opendqv list works with zero configuration.

See docs/compliance-contracts.md for the full list with regulatory context, or browse opendqv/contracts/ directly. 17 minimal starter templates are in examples/starter_contracts/.


Performance

EC2 c6i.large, 2 workers, 12-rule contract, mixed 50/50 workload: ~482 req/s, p99 ~182 ms. Sizing rule: WEB_CONCURRENCY = number of vCPUs.

See docs/benchmark_throughput.md for full platform comparison, methodology, and monthly volume extrapolation.


Documentation

Quickstart Build your first contract in 15 minutes
Rules Reference All rule types with parameters and examples
Compliance Contracts 44 contracts with regulatory context
API Reference REST endpoints, SDK, GraphQL, webhooks
Security Deployment checklist, threat model, RBAC
Production Deployment Token auth, TLS, Docker Compose, hardening
Integrations Salesforce, Kafka, Snowflake, dbt, Databricks, MCP, and more
All docs โ†’ 76 documentation files

API Stability

OpenDQV is in Beta as of 2.0.0. The following stability commitments apply to the v2.x series:

  • REST API endpoints โ€” paths, request bodies, and response shapes are stable within v2.x. Backwards-incompatible changes require a major version bump and follow a deprecation cycle (one minor release of warnings before removal).
  • YAML contract format โ€” the contract schema (rules, fields, types) is stable within v2.x. New rule types may be added; existing rules will not change semantics without a deprecation cycle.
  • Python SDK โ€” OpenDQVClient, AsyncOpenDQVClient, and LocalValidator public method signatures are stable within v2.x. Internal helpers (prefixed _) are not covered.
  • MCP tools โ€” tool names and parameters are stable within v2.x.
  • Security fixes โ€” backported to the latest 2.x line on a best-effort basis.

Known limitations in v2.2.x

  • Rule null handling is inconsistent. Most format rules fail when the target field is missing; a few (max_length, allowed_values) pass silently; field_sum and ratio_check coerce missing operands to 0. Single-record and batch paths disagree in a few cases. See docs/rules/core_rules.md for the full matrix and the safe pattern to use today. v2.3.0 will make this consistent (loud-by-default with an optional: true opt-out).
  • Unknown rule types pass silently at runtime. A typo in type: (e.g. min_lenght) is caught by opendqv lint but not by the engine โ€” a typo'd rule is a disabled rule. Always lint before deploy. v2.3.0 will reject unknown types at contract load.

Contributing

See CONTRIBUTING.md for setup instructions, coding guidelines, and how to submit changes.

License

MIT โ€” see LICENSE.

Acknowledgements

Led by Sunny Sharma, BGMS Consultants Ltd. The vision, the architecture, every contract, and every design decision in this repository are directed by a human who believes data quality is a write-time responsibility.

OpenDQV is built with a hybrid team. Sunny leads โ€” carbon and silicon. Three AI collaborators execute: Claude Sonnet 4.6 (primary developer), Claude Opus 4.6 (strategic auditor), and Grok (market intelligence). All answer to the same ethos: trust is easier to build than to repair.

Release History

VersionChangesUrgencyDate
v2.2.5### Added - **\`opendqv fork <src> <dst>\`** โ€” copy a contract to a new name as a clean DRAFT. Rewrites \`name:\`, \`version: \"1.0\"\`, \`status: draft\`, and \`asset_id:\` in place while preserving all comments, descriptions, and rules from the source. Replaces the \`cp + edit + reset\` workflow with one command. - **Linter rule \`FILENAME_NAME_MISMATCH\`** โ€” \`opendqv lint\` now errors when the filename stem differs from the YAML's internal \`name:\` field. Catches the footgun where \`cp medHigh4/17/2026
v2.2.4### Shipped - **43 bundled contracts now ship inside the Python package.** `pip install opendqv` followed by `opendqv list` works with zero configuration. Before v2.2.4, pip-install users saw *"No contracts found"* because the `contracts/` directory lived at repo root and never entered the wheel. They now live at `opendqv/contracts/` inside the package. - **`opendqv init --all`** โ€” new flag copies every bundled contract (43+ regulated domains) plus reference lookup files into the target directorHigh4/17/2026
v2.2.3### Fixed - **4 broken `max_length` rules** in banking_transaction, fmcg_product, retail_product, and media_content contracts. Rules used `max:` instead of `max_length:` in YAML โ€” Pydantic alias mapped to wrong field, so rules silently never fired. Found via MCP-driven sample record audit. - **16 sample record files** aligned with v1.1 contracts. 11 full rewrites (v1.0โ†’v1.1 field name changes), 5 minor fixes. 142/142 sample records now validate correctly. - **proof_of_play samples** โ€” panel_id High4/15/2026
v2.2.2### Fixed - MCP server version was hardcoded as "1.8.4" โ€” now reads from `config.ENGINE_VERSION` dynamically Medium4/12/2026
v2.2.1## Highlights - **Security:** Removed yaml.full_load() fallback โ€” eliminated RCE vector from YAML loading path - **Performance:** O(n squared) to O(n) grouped uniqueness โ€” 954x faster at 2,000 records (131s to 0.14s) - **Maintainability:** _check_rule() dispatch table โ€” 417-line god function to 23 handlers + dict lookup - **DX:** 62 broken import paths fixed across 27 docs files for pip users ### Full changelog 15 code quality improvements shipped via PICK methodology (Possible/Implement/ChalMedium4/11/2026
v2.2.0## Highlights - **Security:** Removed yaml.full_load() fallback โ€” eliminated RCE vector from YAML loading path - **Performance:** O(n squared) to O(n) grouped uniqueness โ€” **954x faster** at 2,000 records (131s to 0.14s) - **Maintainability:** _check_rule() dispatch table โ€” 417-line god function to 23 handlers + dict lookup - **DX:** 62 broken import paths fixed across 27 docs files for pip users ### Full changelog 15 code quality improvements shipped via PICK methodology (Possible/Implement/Medium4/11/2026
v2.1.0## What's Changed **Critical distribution fix** โ€” `pip install opendqv` now works correctly. `import opendqv` succeeds, all modules live under the `opendqv/` namespace, and no PyPI package collisions. ### Changes - **Namespace restructure**: All modules moved under `opendqv/` package โ€” eliminates collisions with `sdk`, `security`, `core` top-level PyPI packages - **SEC-001 hardened**: `regex` library is now a required dependency โ€” ReDoS timeout protection guaranteed on every install - **`opendMedium4/11/2026
v2.0.0OpenDQV Core graduates from Alpha to **Beta**. No breaking changes from 1.9.8 โ€” this release is a status milestone, not an API break. Existing 1.9.x deployments upgrade in place. ## What Beta means - **Public API surface is stable.** REST endpoints, contract YAML schema, MCP tool names, and Python SDK signatures will not change without a deprecation cycle (one minor release of warnings before removal). - **Security fixes are backported** to the latest 2.x line. - **Coverage 93%, 3,398 tests** Medium4/7/2026
v1.9.8## Performance **4ร— regex throughput improvement** โ€” `_safe_match()` now calls `compiled_pattern.match(str_val, timeout=...)` directly on the pre-compiled `regex.Pattern` object. Valid-record mean latency: 0.161 ms โ†’ 0.040 ms. Invalid-record: 0.234 ms โ†’ 0.052 ms. ## Bug Fix **Latent ReDoS timeout masking** โ€” `except _regex_lib.TimeoutError:` would have raised `AttributeError` if a regex timeout actually fired, masking the SEC-001 control. Fixed to `except TimeoutError:` (the builtin that `regMedium4/3/2026
v1.9.7## Coverage sprint: 90.87% โ†’ 93.0% **3398 tests** (up from 3314 / +84 tests). `fail_under` raised from 90 to 93. ### What was covered - **JSON decode exception handlers** โ€” `rule_heatmap`, `rule_failure_velocity`, `observation_fields` in `core/quality_analytics.py` - **Auth function edge paths** โ€” open-mode invalid Bearer fallback, non-Bearer 401, `get_current_role` validator fallback in `security/auth.py` - **Batch validation edge cases** โ€” `compare_to="now"` sentinel, date-parse string fallMedium4/2/2026
v1.9.6## v1.9.6 โ€” Coverage sprint continuation (89.8% โ†’ 90.9%) ### Summary This release addresses the three open items from CRT152: 1. **Dead code removed** โ€” `api/routes_contracts.py`: the `except UnknownContextError` block in `generate_code_endpoint` was unreachable. `get_rules_with_context()` logs and falls back to base rules for unknown contexts; it never raises this exception. The dead try/except has been deleted. 2. **`core/onboarding.py` coverage 80.8% โ†’ 91.9%** โ€” New tests in `test_onboardMedium4/2/2026
v1.9.5## Coverage Sprint: 80.4% โ†’ 89.8% Aimed for 100%, landed at 89.8%. Coverage threshold raised from 80% to 89%. **3,251 tests** (up from 2,933 / +318 new tests) ### New Test Files - `tests/test_cli_extended.py` (69 tests) โ€” in-process `cmd_*` function coverage - `tests/test_explainer.py` (63 tests) โ€” all rule type handlers. `core/explainer.py` โ†’ **100%** - `tests/test_linter_extended.py` (31 tests) โ€” required_if, allowed_values, date_diff, age bounds - `tests/test_storage_extended.py` (15 testsMedium4/2/2026
v1.9.4## What's in v1.9.4 ### Coverage raised to 80% - 101 new tests across 7 new/extended test files - Rule types: `field_sum`, `forbidden_if`, `conditional_value`, `date_diff`, 8 checksum algorithms (`mod10_gs1`, `iban_mod97`, `isin_mod11`, `isrc_luhn`, `lei_mod97`, `nhs_mod11`, `cpf_mod11`, `vin_mod11`), `compare` edge cases - Import API `save=True` branches: dbt, soda, csv, CSVW, OTel, NDC, ODCS - Analytics endpoints: rejection-summary, rule-velocity, observation/summary/trend/fields - Worker heaMedium4/2/2026
v1.9.3## CRT150 โ€” Professional Quality Baseline **Beta polish sprint โ€” the trust signals that make a project credible before you read the code.** ### Added - **py.typed markers** (PEP 561) โ€” sdk/, core/, api/, security/ now declare type information. IDEs and type checkers will provide proper autocomplete and type safety for downstream users. - **Coverage threshold** โ€” 77% enforced in CI. Measured baseline is 77.5%; threshold prevents silent regression across sprints. - **SDK unit tests** (`tests/tesMedium4/2/2026
v1.9.2## What's changed ### Security - **N3**: `GET /tokens` now requires `admin` role โ€” token metadata was visible to any authenticated user - **N1**: `SECURITY.md` updated to reflect M1 DNS rebinding fix and L2 token revocation fix ### Bug fixes - **N4**: `encoding="utf-8"` added to 12 `open()` calls in H1-refactored router files (Windows portability) - **N8**: Blocking DNS resolution in webhook `_send()` wrapped in `asyncio.to_thread()` - **N5**: `revoke_system_tokens` open-mode guard harmonised Medium3/31/2026
v1.9.1## Changes ### Refactoring - **H1**: `api/routes.py` (2,764L) split into 8 domain sub-routers + `api/deps.py`. No URL or behaviour changes. Contributor onboarding significantly improved. ### Security - **H2**: Removed dead `require_role()` from `security/auth.py` โ€” unused factory creating false sense of centralised RBAC - **M1**: Webhook SSRF hardened โ€” hostname resolved and IP-checked at *send time*, not just registration. Mitigates DNS rebinding. - **L1**: `init_db()` removed from module impMedium3/31/2026
v1.9.0## Security fix (RT148 C2) **High โ†’ fixed:** Contract state machine now enforces valid lifecycle transitions. `set_status()` previously accepted any transition including `archived โ†’ active`, allowing an approver to bypass the maker-checker review workflow entirely. ### Transition map (enforced at `core/contracts.py`) | From | Allowed to | |------|-----------| | `draft` | `active`, `archived` | | `review` | `active`, `draft`, `archived` | | `active` | `archived`, `draft` | | `archived` | `draMedium3/31/2026
v1.8.9## Security fix (RT148 C1) **Critical:** `POST /tokens/generate` now requires `admin` role in `AUTH_MODE=token`. Previously any authenticated user (even `validator` role) could call this endpoint and generate tokens with any role including `admin`, completely bypassing the RBAC model. ### Changes - `api/routes.py`: `caller_role: str = Depends(get_current_role)` + 403 guard for non-admin callers - `config.py`: `read_text(encoding="utf-8")` โ€” Windows portability fix (M2) - `tests/test_rbac.py`:Medium3/30/2026
v1.8.8## What's new in v1.8.8 ### Observation mode โ€” analytics and workbench Observation-only mode (introduced in v1.8.7) is now fully instrumented with analytics and a dedicated dashboard. **New API endpoints:** - `GET /api/v1/observation/summary?days=7&contract=X` โ€” would_have_failed count, enforcement_readiness_pct, by_contract breakdown - `GET /api/v1/observation/trend?contract=X&days=7` โ€” daily time-series of observation violations - `GET /api/v1/observation/fields?contract=X&days=7` โ€” top faiMedium3/28/2026
v1.8.7## What's new in v1.8.7 ### Observation-Only Mode Run validation without blocking โ€” the pilot entry-point feature. **CLI:** ```bash opendqv validate-file my_contract data.csv --observe-only ``` Exits 0 regardless of violations. Output labelled `OBSERVATION RUN`. `--output-failures` still works to export what would have been rejected. **API:** ```json POST /api/v1/validate {"contract": "my_contract", "record": {...}, "observe_only": true} ``` Returns HTTP 200 with `"mode": "observation_only"`Medium3/27/2026
v1.8.6## What's in this release ### New features - **Typed error codes** โ€” every validation failure carries `error_code: OPENDQV_{RULE_TYPE}_001`. Stable across contract versions. Safe for Kafka DLQ routing, PagerDuty rules, ServiceNow auto-tickets. See [docs/error_codes.md](docs/error_codes.md). - **`opendqv validate-file <contract> <path>`** โ€” validate CSV/TSV/Parquet without starting the API server. Optional `--output-failures failed.csv` flag. - **Benchmark suite** โ€” five standard workloads coverMedium3/27/2026
v1.8.5## Bug fix **`ENGINE_VERSION` was hardcoded as `"1.0.0"` since the project began.** Every audit trail entry produced since v1.1.0 has been stamped with the wrong engine version โ€” a credibility issue for any regulated customer reviewing the hash-chained audit chain. ### Fix `ENGINE_VERSION` now reads from `pyproject.toml` at runtime (source installs) with `importlib.metadata` as fallback (pip installs). The version in audit entries will always match the actual running version. A CI assertion Medium3/26/2026
v1.8.4## What's new - New MCP tool: `get_quality_trend` โ€” daily pass-rate with improving/declining/stable summary - Per-contract latency in `get_quality_metrics` (previously all contracts returned identical global figures) - `agent_id` filter on `get_quality_metrics` for single-source attribution - MCP server icon โ€” OpenDQV logo now displayed in Claude Desktop and MCP-compatible clients - 2,635 tests passing (9 MCP tools total)Medium3/26/2026
v1.8.3## What's new - `agent_id` as first-class analytics dimension โ€” filter quality metrics by source agent - Rule failure velocity โ€” track which rules are failing fastest over time - SQLite fallback for analytics when DuckDB is unavailable - MCP `get_quality_metrics` updated with `agent_id` support - 2,622 tests passingMedium3/26/2026
v1.8.2## What's new - 10 customer demo scripts covering: HR, GDPR DSAR, Healthcare, MiFID II, DORA, SOX, Companies House, Martyn's Law, Building Safety, OOH proof_of_play - `context="demo"` persistence โ€” demo context survives across validation calls - `teardown_demo.py` โ€” one-shot cleanup including Marmot catalog entries - `DELETE /api/v1/quality/stats?context=` endpoint for selective stats reset - Bug fix: `date_diff` rule no longer fires when field is absent - 2,601 tests passingMedium3/26/2026
v1.8.1## What's new - PPDS demo script for food allergen / Natasha's Law validation walkthrough - Contract metadata improvements (`catalog_visible` flag for allereasy_dish) - CHANGELOG updated Patch release โ€” no new API surface; demo tooling and contract metadata only.Medium3/26/2026
v1.8.0## What's new ### DuckDB OLAP analytics layer Completes the OLTP/OLAP split introduced in v1.7.1: - **`core/quality_analytics.py`** โ€” new `QualityAnalytics` class. DuckDB attaches the SQLite `quality_stats` table directly via its built-in SQLite extension โ€” zero data duplication from the OLTP write path. - **`GET /api/v1/analytics/summary?days=N`** โ€” cross-contract pass rate summary, sorted worst-first (most useful for triage). - **`GET /api/v1/analytics/rule-heatmap?days=N`** โ€” top-50 failinMedium3/25/2026
v1.7.1## Bug fixes **Single-record validations now persist to SQLite** โ€” `/validate` (single-record) previously wrote only to in-memory stats, so all per-contract pass rates were lost on API restart. Now also writes to the `quality_stats` SQLite table โ€” same path batch validation already used. **`push_quality_lineage.py` reads from SQLite, not in-memory stats** โ€” replaced `GET /api/v1/stats` (resets on restart) with per-contract `GET /api/v1/contracts/{name}/quality-trend?days=30` (SQLite-backed). PMedium3/25/2026
v1.7.0## What's new **MCP constraint field exposure** โ€” `get_contract` now returns `allowed_values`, `pattern`, `min_value`, `max_value`, `min_length`, `max_length` on every rule. AI agents no longer need to trigger validation failures to discover valid values. **Real `window_hours` filtering** โ€” `get_quality_metrics(window_hours=N)` now actually scopes stats to the last N hours. `ValidationStats` gains a timestamped `_events` deque (`maxlen=10,000`). `GET /api/v1/stats?window_hours=N` added for RESMedium3/25/2026
v1.6.0## What's new ### Marmot downstream consumers Contracts now support `downstream_consumers` โ€” a list of Marmot MRNs for assets that consume the validated dataset (dashboards, dbt models, etc.). `push_quality_lineage.py` stitches direct `downstream` edges in Marmot automatically, completing the full lineage graph: ``` [source] โ†’ [opendqv:validate:X] โ†’ [opendqv:X] โ†’ [tableau/sales_dashboard] ``` ### `catalog_visible` flag Set `catalog_visible: false` on a contract to exclude it from Marmot catalMedium3/25/2026
v1.5.1## What's in this release **Maintenance patch** โ€” no new features, no breaking changes. ### DRY refactoring (PR #30) - 18 copy-paste violations eliminated across `api/routes.py`, `config.py`, `security/auth.py`, `main.py`, `cli.py` - 5 new route helper functions: `_get_contract_or_404`, `_get_contract_versioned_or_404`, `_get_contract_hash`, `_check_validate_in_states`, `_assert_contract_mutable` - `VALID_ROLES` centralised in `auth.py` โ€” single source of truth for both API and CLI - `IS_OPEN_Medium3/24/2026
v1.5.0## What's new ### Workbench UX Overhaul The Streamlit governance workbench has been significantly redesigned: - **Grouped sidebar navigation** โ€” sections now organised under CORE, INTEGRATIONS, and CONTRACT TOOLS headers - **Validate** โ€” "Validate Record" and "Validate Batch" merged into one section with a mode toggle; sample JSON generation is now explicit opt-in (no more auto-reset when switching contracts) - **Audit Trail** โ€” "Version History" renamed to "Audit Trail" - **Catalogs & AI** โ€”Medium3/24/2026
v1.3.3## What changed Two compliance gaps in the `qsr_menu_item` contract closed. This contract enforces all 14 Natasha's Law allergens on Pre-Packed for Direct Sale (PPDS) food items. ### Fixes **`sulphites_ppm` is now required when sulphites are declared** Previously, if `contains_sulphites = "true"` but `sulphites_ppm` was omitted, the `min: 10` threshold rule silently never fired โ€” the record passed with sulphites declared but no concentration recorded. Added `required_if: {field: contains_sulMedium3/23/2026
v1.3.2 ### Windows Compatibility (RT96 โ€” Python 3.13.12, real hardware benchmark) - **Windows test runner** โ€” `scripts/windows_test.bat`: 3-run benchmark (matching RT72 Pi 400 methodology), pre-flight disk space + Python 3.11+ checks, UTF-8 mode, summary block with per-run timing, full cleanup. Verified: 2387 passed, 6 skipped, ~4:48 per run - **UTF-8 encoding** โ€” explicit `encoding="utf-8"` on all `read_text()` / `write_text()` calls touching YAML files across `core/contracts.py`, `core/onboarding.pMedium3/22/2026
v1.3.1## Developer Experience **Postman collection** โ€” explore all 50 API endpoints in one click. Import `postman/OpenDQV.postman_collection.json` + `postman/OpenDQV.postman_environment.json` into Postman. 10 folders, auto-auth wiring, ready to run against `AUTH_MODE=open`. See [docs/postman.md](docs/postman.md). **Demo Docker environment** โ€” pre-seeded, zero configuration. ```bash cp .env.example .env docker compose -f docker-compose.demo.yml up -d ``` ~740 validation events across 7 contracts. Low3/22/2026
v1.3.0## What's new in v1.3.0 17 contracts upgraded from thin/weak presence checklists to production-grade validation with deep domain-specific rules and regulatory commentary. **Contract portfolio:** 0 thin/weak contracts remaining (was 14). ~71% production-grade (was 45%). **Tests:** 2,383 passing (was 2,261). ### Highlights - ISO 3779 VIN validation in `automotive_vehicle` - ClinicalTrials.gov NCT + ICH-GCP rules in `pharma_clinical_trial` - ISIN + LEI + Incoterms 2020 across financial and logLow3/22/2026
v1.2.3 ### Features - **`allowed_values` rule type** โ€” validate that a field value is one of an inline list without needing a separate lookup file. Supports single-record and DuckDB batch validation. ```yaml - name: status_valid field: status type: allowed_values allowed_values: [active, inactive, pending] severity: error error_message: "status must be one of: active, inactive, pending" ``` - **Lifecycle webhooks** โ€” three new webhook events fire on contract lifecycle Low3/22/2026
v1.2.2 ### Fixes - **Code generator โ€” silent gap eliminated:** Rule types not implemented by a generator target previously emitted nothing (silent drop). Now emit an explicit `// NOTE: requires API validation` comment for known API-only types (`required_if`, `lookup`, `compare`, `date_diff`, `checksum`, etc.) and a `// TODO` comment for any unknown future types. Users deploying generated code can now see exactly which rules are enforced and which require the live API. - **Salesforce generatLow3/22/2026
v1.2.1 ### UI - **Governance Audit Trail** โ€” "Version History" tab renamed "Contract Audit & Lifecycle". Now shows hash chain integrity banner (โœ… intact / โŒ broken), timeline view with proposed-by / approved-by / rejected-by / rejection-reason per entry, and raw history table in collapsible expander. All governance fields were already stored in the DB; this release surfaces them. ### Documentation - **`docs/faq.md`** โ€” new FAQ covering: LLM/Claude scripts vs OpenDQV, GE/Soda/dbt comparisoLow3/22/2026
v1.2.0 ### Contracts - **`dora_ict_incident`** โ€” EU DORA (Digital Operational Resilience Act), Articles 17-19. ICT incident reporting for EU financial entities (in force 17 January 2025). Enforces incident classification, 24h early warning and 72h notification windows via `date_diff` rule, root cause documentation for major/significant incidents, and remediation tracking. 30 rules. 3 new reference files. - **`hipaa_disclosure_accounting`** โ€” US HIPAA 45 CFR 164.528. Accounting of disclosuresLow3/21/2026
v1.1.0 ### Contracts - **`gdpr_processing_record`** โ€” UK GDPR Article 30 Record of Processing Activities (ROPA). Enforces lawful basis declaration (all 6 Article 6 bases), consent-specific fields (mechanism, timestamp, withdrawal) via `required_if`, Legitimate Interests Assessment gating, special category data basis (Article 9), international transfer safeguard, and DPO audit trail. 29 rules. 7 new reference files. - **`gdpr_dsar_request`** โ€” UK GDPR Article 15 Data Subject Access Request haLow3/21/2026
v1.0.7## What's changed ### Fixes **PyPI publish has been broken since v1.0.1.** All six releases since the initial v1.0.0 failed to publish with `400 Bad Request` โ€” `pyproject.toml` was stuck at `1.0.0` and every release attempted to re-upload a version that already existed on PyPI. This release fixes it permanently: - `publish.yml`: added `poetry version ${GITHUB_REF_NAME#v}` step โ€” the package version is now derived from the git tag automatically on every future release - `publish.yml`: pinned Low3/21/2026
v1.0.6## What's changed ### Contract #35 โ€” `martyns_law_event` Martyn's Law (Terrorism (Protection of Premises) Act 2025) qualifying events contract โ€” the follow-on to `martyns_law_venue` for temporary and one-off events where 200 or more persons are expected to attend. **Key distinctions from `martyns_law_venue`:** | | `martyns_law_venue` | `martyns_law_event` | |-|---------------------|---------------------| | Responsible party | Accountable Person (AP) | Event organiser | | SIA obligation | RegLow3/21/2026
v1.0.5## What's changed ### Contracts โ€” two UK corporate and building safety compliance laws - **`building_safety_golden_thread`** โ€” Building Safety Act 2022. Enforces the Act's own obligation โ€” *"accurate and up-to-date information throughout the building lifecycle"* โ€” for higher-risk buildings (18m+ or 7+ storeys). Mandatory fields: named Accountable Person, Building Safety Manager, BSR registration number, Safety Case documentation, fire and emergency file, residents engagement strategy, and goldLow3/21/2026
v1.0.4## What's changed ### Contracts โ€” two named-victim UK compliance laws - **`qsr_menu_item`** โ€” Natasha's Law (Food Information (Amendment) (England) Regulations 2019, in force 1 October 2021). Allergen compliance contract for Pre-Packed for Direct Sale (PPDS) food. All 14 major allergens are mandatory fields โ€” omission triggers a 422 before the record enters the system. Named after Natasha Ednan-Laperouse (2001โ€“2016). 49 rules. - **`martyns_law_venue`** โ€” Terrorism (Protection of Premises) ActLow3/21/2026
v1.0.3## What's changed ### Fixes - **Three additional unprotected context endpoints** โ€” `POST /generate`, `GET /export/gx/{name}`, and `GET /export/odcs/{name}` were missing the `UnknownContextError` guard. An unknown `context` parameter would produce an unhandled exception. Now returns 422 consistently across all six context-accepting endpoints. - **Regex rule with no `pattern` now fails records** โ€” previously a misconfigured `regex` rule (no `pattern` field) silently passed every value. Now retuLow3/21/2026
v1.0.2## What's changed ### Security - **`python-jose` โ†’ `PyJWT`** โ€” `python-jose` carried `ecdsa` as a transitive dependency (CVE-2024-23342, Minerva timing attack on P-256 ECDSA). OpenDQV uses `HS256` exclusively โ€” ECDSA operations are never called. Migrated to `PyJWT>=2.10.0`, which has zero extra dependencies. The packages `ecdsa`, `pyasn1`, and `rsa` are removed from the dependency tree entirely. No API changes โ€” `jwt.encode`/`jwt.decode` signatures are identical. - **Starlette CVEs dismissed*Low3/21/2026
v1.0.1## What's changed Five issues identified through community stress-testing after the v1.0.0 launch, all fixed and fully tested. ### Bug fixes - **`date_format` ignores `rule.format`** โ€” custom date/datetime formats specified in the contract (e.g. `'%Y-%m-%d %H:%M:%S'` for SQL Server-style timestamps) were silently ignored. The validator now uses `rule.format` first, then falls back to common formats. - **`/explain` returns 401 in `AUTH_MODE=open`** โ€” the auth check order was inverted. CallingLow3/21/2026
v1.0.0## OpenDQV v1.0.0 โ€” Initial Public Release *Trust is cheaper to build than to repair.* OpenDQV is an open-source, contract-driven data quality validation platform. Validate records against YAML data contracts at the point of write โ€” before data enters the pipeline. --- ### What's included - **24 rule types** โ€” regex, min/max, range, not_empty, date_format, compare, lookup, checksum, min_age, max_age, required_if, unique, cross_field_range, field_sum, date_diff, ratio_check, and Low3/14/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

headroomThe Context Optimization Layer for LLM Applicationsv0.8.3
mcp-workspaceMCP Workspace Server: A secure Model Context Protocol server providing file, git, and GitHub tools for AI assistants within a sandboxed project directory.0.1.6
MCP---Agent-Starter-Kit๐Ÿš€ Build and explore multi-agent AI workflows with ready-to-use projects for document serving, Q/A bots, and orchestration.main@2026-04-21
contrastapiSecurity intelligence API and MCP server for AI agents. 25 tools, 35+ endpoints: CVE/EPSS/KEV, domain recon, SSL, IP reputation, threat intel, email security, code scanning. Free, no signup.v1.9.0
openecon-dataGive your AI agent accurate economic data. 330K indicators from FRED, World Bank, IMF, Eurostat & more. MCP server + web UI.main@2026-04-19