databricks-labs-dqx
Data Quality eXtended (DQX) is a Python library for data quality checks and data quality monitoring
Description
DQX by Databricks Labs === <p align="center"> <a href="https://github.com/databrickslabs/dqx"> <img src="https://raw.githubusercontent.com/databrickslabs/dqx/refs/heads/main/docs/dqx/static/img/logo.svg" class="align-center" width="200" height="200" alt="logo" /> </a> </p> Simplified Data Quality checking at Scale for PySpark Workloads on streaming and standard DataFrames. [](https://github.com/databrickslabs/dqx/actions/workflows/push.yml) [](https://codecov.io/github/databrickslabs/dqx)  [](https://pypi.org/project/databricks-labs-dqx/)  # 📖 Documentation The complete documentation is available at: [https://databrickslabs.github.io/dqx/](https://databrickslabs.github.io/dqx/) # 🛠️ Contribution Please see the contribution guidance [here](https://databrickslabs.github.io/dqx/docs/dev/contributing/) on how to contribute to the project (build, test, and submit a PR). # 💬 Project Support Please note that this project is provided for your exploration only and is not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS, and we do not make any guarantees. Please do not submit a support ticket relating to any issues arising from the use of this project. Any issues discovered through the use of this project should be filed as GitHub [Issues on this repository](https://github.com/databrickslabs/dqx/issues). They will be reviewed as time permits, but no formal SLAs for support exist.
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 0.13.0 | Imported from PyPI (0.13.0) | Low | 4/21/2026 |
| v0.13.0 | ## What's Changed * New DQX Data Quality Dashboard ([#1019](https://github.com/databrickslabs/dqx/issues/1019)). The data quality dashboard has been significantly enhanced to provide a centralized view of data quality metrics across all tables, allowing users to monitor and track data quality issues with greater ease. The dashboard now consists of three tabs - Data Quality Summary, Data Quality by Table (Time Series), and Data Quality by Table (Full Snapshot) - each catering to different monito | Low | 2/9/2026 |
| v0.12.0 | ## What's Changed * AI-Assisted rules generation from data profiles ([#963](https://github.com/databrickslabs/dqx/issues/963)). AI-assisted data quality rule generation was added, leveraging summary statistics from a profiler to create rules. The `DQGenerator` class includes a `generate_dq_rules_ai_assisted` method that can generate rules with or without user-provided input, using summary statistics to inform the rule creation process. This method offers flexibility in rule generation, allowi | Low | 12/20/2025 |
| v0.11.1 | ## What's Changed * Hotfix to update log level for spark connect to suppress dlt telemetry warnings in non-dlt serverless clusters. Contributors: @mwojtyczka | Low | 12/2/2025 |
| v0.11.0 | * Generationg of DQX rules from ODCS Data Contracts ([#932](https://github.com/databrickslabs/dqx/issues/932)). The Data Contract Quality Rules Generation feature has been introduced, enabling users to generate data quality rules directly from data contracts following the Open Data Contract Standard (ODCS). This feature supports three types of rule generation: predefined rules derived from schema properties and constraints, explicit DQX rules embedded in the contract, and text-based rules defin | Low | 12/1/2025 |
| v0.10.0 | * Added Data Quality Summary Metrics ([#553](https://github.com/databrickslabs/dqx/issues/553)). The data quality engine has been enhanced with the ability to track and manage summary metrics for data quality validation, leveraging Spark's Observation feature. A new `DQMetricsObserver` class has been introduced to manage Spark observations and track summary metrics on datasets checked with the engine. The `DQEngine` class has been updated to optionally return the Spark observation associated wit | Low | 11/7/2025 |
| v0.9.3 | * Added support for running checks on multiple tables ([#566](https://github.com/databrickslabs/dqx/issues/566)). Added more flexibility and functionality in running data quality checks, allowing users to run checks on multiple tables in a single method call and as part of Workflows execution. Provided options to run checks for all configured run configs or for a specific run config, or for tables/views matching wildcard patterns. The CLI commands for running workflows have been updated to refl | Low | 10/3/2025 |
| v0.9.2 | * Added performance benchmarks ([#548](https://github.com/databrickslabs/dqx/issues/548)). Performance tests are run to ensure performance does not degrade by more than 25% by any change. Benchmark results are published in the documentation in the reference section. The benchmark covers all check functions, running all funcitons at once and applying the same funcitons at once for multiple columns using foreach column. A new performance GitHub workflow has been introduced to automate performance | Low | 9/5/2025 |
| v0.9.1 | ## 0.9.1 * Added quality checker and end to end workflows ([#519](https://github.com/databrickslabs/dqx/issues/519)). This release introduces no-code solution for applying checks. The following workflows were added: quality-checker (apply checks and save results to tables) and end-to-end (e2e) workflows (profile input data, generate quality checks, apply the checks, save results to tables). The workflows enable quality checking for data at-rest without the need for code-level integration. It | Low | 8/25/2025 |
| v0.8.0 | * Added new row-level freshness check ([#495](https://github.com/databrickslabs/dqx/issues/495)). A new data quality check function, `is_data_fresh`, has been introduced to identify stale data resulting from delayed pipelines, enabling early detection of upstream issues. This function assesses whether the values in a specified timestamp column are within a specified number of minutes from a base timestamp column. The function takes three parameters: the column to check, the maximum age in minut | Low | 8/6/2025 |
| v0.7.1 | * Added type validation for apply checks method ([#465](https://github.com/databrickslabs/dqx/issues/465)). The library now enforces stricter type validation for data quality rules, ensuring all elements in the checks list are instances of `DQRule`. If invalid types are encountered, a `TypeError` is raised with a descriptive error message, suggesting alternative methods for passing checks as dictionaries. Additionally, input attribute validation has been enhanced to verify the criticality value | Low | 7/23/2025 |
| v0.7.0 | * Added end-to-end quality checking methods ([#364](https://github.com/databrickslabs/dqx/issues/364)). The library now includes end-to-end quality checking methods, allowing users to read data from a table or view, apply checks, and write the results to a table. The `DQEngine` class has been updated to utilize `InputConfig` and `OutputConfig` objects to handle input and output configurations, providing more flexibility in the quality checking flow. The `apply_checks_and_write_to_table` and `app | Low | 7/9/2025 |
| v0.6.0 | Release v0.6.0 (#395) * Added Dataset-level checks, Foreign Key and SQL Script checks ([#375](https://github.com/databrickslabs/dqx/issues/375)). The data quality library has been enhanced with the introduction of dataset-level checks, which allow users to apply quality checks at the dataset level, in addition to existing row-level checks. Similar to row-level checks, the results of the dataset-level quality checks are reported for each individual row in the result columns. A new `DQDatasetRu | Low | 6/26/2025 |
| v0.5.0 | ## What's Changed * Fix spark remote version detection in CI by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/342 * Fix spark remote installation by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/346 * Load and save checks from a Delta table by @ghanse in https://github.com/databrickslabs/dqx/pull/339 * Handle nulls in uniqueness check for composite keys to conform with the SQL ANSI Standard by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/345 Corrected | Low | 6/6/2025 |
| v0.4.0 | * Added input spark options and schema for reading from the storage (https://github.com/databrickslabs/dqx/issues/312). This commit enhances the data quality framework used for profiling and validating data in a Databricks workspace with new options and functionality for reading data from storage. It allows for the usage of input spark options and schema, and supports fully qualified Unity Catalog or Hive Metastore table names in the format of catalog.schema.table or schema.table. Additionally, | Low | 4/8/2025 |
| v0.3.1 | * Removed usage of lambda in quality checking ([#310](https://github.com/databrickslabs/dqx/issues/310)). We have replaced the usage of lambda functions n the quality checking with a more efficient implementation, and updated the method to handle optional arguments in validation. These changes improve the performance of the quality checking. Contributors: @mwojtyczka | Low | 3/24/2025 |
| v0.3.0 | * Added sampling to the profiler ([#303](https://github.com/databrickslabs/dqx/issues/303)). The profiler's performance has been significantly improved in this release through the addition of sampling and limiting the input data. The profiler now samples input data with a 30% sampling factor and limits the number of records to 1000 by default, reducing the amount of data processed and enhancing performance. These changes are configurable and can be customized. This resolves issue [#215](https:/ | Low | 3/19/2025 |
| v0.2.0 | * Added uniqueness check([#200](https://github.com/databrickslabs/dqx/issues/200)). A uniqueness check has been added, which reports an issue for each row containing a duplicate value in a specified column. This resolves issue [154](https://github.com/databrickslabs/dqx/issues/154). * Added sql expression support for limits in not less and not greater than checks, and updated docs ([#200](https://github.com/databrickslabs/dqx/issues/200)). This commit introduces several changes to simplify and | Low | 3/10/2025 |
| v0.1.13 | * Fixed cli installation and demo ([#177](https://github.com/databrickslabs/dqx/issues/177)). In this release, changes have been made to adjust the dashboard name, ensuring compliance with new API naming rules. The dashboard name now only contains alphanumeric characters, hyphens, or underscores, and the reference section has been split for clarity. In addition, demo for the tool has been updated to work regardless if a path or UC table is provided in the config. Furthermore, documentation has b | Low | 2/27/2025 |
| v0.1.12 | * Fixed installation process for Serverless ([#150](https://github.com/databrickslabs/dqx/issues/150)). This commit removes the pyspark dependency from the librar to avoid spark version conflicts in Serverless and future DBR versions. CLI has been updated to install pyspark for local command execution. * Updated demos and documentation ([#169](https://github.com/databrickslabs/dqx/issues/169)). In this release, the quality checks in the demos have been updated to better showcase the capabilitie | Low | 2/13/2025 |
| v0.1.11 | ## What's Changed * Provided option to customize reporting column names ([#127](https://github.com/databrickslabs/dqx/issues/127)). In this release, the DQEngine library has been enhanced to allow for customizable reporting column names. A new constructor has been added to DQEngine, which accepts an optional ExtraParams object for extra configurations. A new Enum class, DefaultColumnNames, has been added to represent the columns used for error and warning reporting. New tests have been added to | Low | 2/12/2025 |
| v0.1.10 | ## What's Changed * Fixed docs-build by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/129 * Patch user agent by @sundarshankar89 in https://github.com/databrickslabs/dqx/pull/121 * New dashboard query, Update to demos and docs by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/133 * Support datetime arguments for column range functions by @ghanse in https://github.com/databrickslabs/dqx/pull/142 * DQX engine refactor and docs update by @mwojtyczka in https://github.com | Low | 2/4/2025 |
| v0.1.9 | ## What's Changed * Fixed docs-build by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/129 * Patch user agent by @sundarshankar89 in https://github.com/databrickslabs/dqx/pull/121 * New dashboard query, Update to demos and docs by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/133 **Full Changelog**: https://github.com/databrickslabs/dqx/compare/v0.1.8...v0.1.9 | Low | 1/24/2025 |
| v0.1.8 | ## What's Changed * Updated docs by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/117 * added search for docs by @sundarshankar89 in https://github.com/databrickslabs/dqx/pull/119 * ✨ improve docs styling by @renardeinside in https://github.com/databrickslabs/dqx/pull/118 * Add Dashboard as Code, DQX Data Quality Summmary Dashboard by @nehamilak-db in https://github.com/databrickslabs/dqx/pull/86 * updated profiling documentation with cost consideration by @canan-girgin in https | Low | 1/23/2025 |
| v0.1.7 | ## What's Changed * Set cache invalidation for pypi badge by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/102 * Correct handling of Decimal, Short and Byte types by @alexott in https://github.com/databrickslabs/dqx/pull/103 * ✨ introduce docs by @renardeinside in https://github.com/databrickslabs/dqx/pull/104 * Rollback for readme and contributing by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/112 * 🛠️ fix docs path by @renardeinside in https://github.com/databr | Low | 1/21/2025 |
| v0.1.6 | ## What's Changed * Fix for image links in README on PyPi by @alexott in https://github.com/databrickslabs/dqx/pull/95 * added test methods for InstallationMixin.py, log.py and dlt_rules by @canan-girgin in https://github.com/databrickslabs/dqx/pull/93 * issue 47 - new check is_not_null_and_not_empty_array and fixed timestamp mismatch issue in profiler by @dinbab1984 in https://github.com/databrickslabs/dqx/pull/98 * Updated logo by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/9 | Low | 1/17/2025 |
| v0.1.5 | ## What's Changed * Fix README on PyPi by using `hatch-fancy-pypi-readme` in the build by @alexott in https://github.com/databrickslabs/dqx/pull/81 * Readme update by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/82 * add OIDC codecov by @sundarshankar89 in https://github.com/databrickslabs/dqx/pull/83 * Release 0.1.5 by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/91 **Full Changelog**: https://github.com/databrickslabs/dqx/compare/v0.1.4...v0.1.5 | Low | 1/15/2025 |
| v0.1.4 | Release v0.1.4 | Low | 1/10/2025 |
| v0.1.1 | ## What's Changed * Updated release process * Fixed installation via cli **Full Changelog**: https://github.com/databrickslabs/dqx/compare/v0.1.0...v0.1.1 | Low | 1/10/2025 |
| v0.1.0 | ## What's Changed * Run `make fmt` by @nfx in https://github.com/databrickslabs/dqx/pull/7 * Migration of the framework to the new project by @mwojtyczka in https://github.com/databrickslabs/dqx/pull/9 * Bump actions/checkout from 4.1.3 to 4.1.4 by @dependabot in https://github.com/databrickslabs/dqx/pull/14 * Added new Unit Test Cases, Improved Coverage by @nehamilak-db in https://github.com/databrickslabs/dqx/pull/22 * Temporary remove acceptance tests (until repo is public) by @alexott i | Low | 1/9/2025 |
| v0.0.0 | Initial release. Required for `labs release` to work. | Low | 4/23/2024 |
