DQ Config (YAML)
Qarion supports configuration-driven data quality through YAML files. Rather than creating checks one-by-one through the UI or API, you define all your quality rules in a single version-controlled file, then use the CLI or SDK to sync them to the platform and execute them.
This approach brings several benefits:
- Version control — Quality rules live alongside your data transformation code, so every change is tracked in Git with a full audit trail.
- Code review — Rule changes go through pull requests, ensuring that thresholds, queries, and schedules are reviewed before deployment.
- Reproducibility — A new environment can be bootstrapped by running
qarion quality applyagainst the same config file. - CI/CD integration — Quality gates can be embedded directly into deployment pipelines.
File Format
A DQ config file is a YAML document with the following top-level keys:
| Key | Type | Required | Description |
|---|---|---|---|
version | string | no | Config schema version (default "1.0") |
space | string | yes | Target space slug — all checks will be created in this space |
defaults | object | no | Default values inherited by every check in the file |
checks | list | yes | One or more check definitions (minimum 1) |
Defaults
The defaults block lets you set values that every check inherits unless explicitly overridden. This avoids repeating the same connector or schedule across dozens of checks.
| Key | Type | Description |
|---|---|---|
connector | string | Default connector slug for execution |
schedule | string | Default cron schedule expression |
Check Definition
Each entry in the checks list defines a single quality rule:
| Key | Type | Required | API Mapping | Description |
|---|---|---|---|---|
slug | string | yes | slug | Unique identifier within the space (URL-safe) |
name | string | yes | name | Human-readable display name |
type | string | yes | check_type | Check type (see Supported Types) |
description | string | no | description | Explains the purpose of the check |
query | string | no | query | SQL query for SQL-based check types |
connector | string | no | — | Connector slug (overrides the default; resolved at apply time) |
product | string | no | product_slug | Target data product slug to link the check to |
schedule | string | no | schedule_cron | Cron expression (overrides the default) |
thresholds | object | no | threshold_config | Pass/fail threshold configuration (see Thresholds) |
configuration | object | no | configuration | Type-specific configuration (see per-type docs below) |
parameters | list | no | parameters | Parameterized query variable definitions (see Parameters) |
Supported Check Types
The type field accepts any of the following values, organized into logical groups.
SQL Checks
These checks execute arbitrary SQL against your data source.
sql_metric
Runs a query that returns a single numeric value, then evaluates it against a threshold. This is the most flexible check type for custom business rules.
- slug: orders-row-count
name: Orders Row Count
type: sql_metric
query: "SELECT COUNT(*) FROM analytics.orders"
product: orders-table
thresholds:
operator: gte
value: 1000
sql_condition
Runs a query and fails if any rows are returned. Useful for asserting "this should never happen" conditions.
- slug: no-negative-amounts
name: No Negative Order Amounts
type: sql_condition
query: "SELECT * FROM analytics.orders WHERE amount < 0"
product: orders-table
Field-Level Checks
Field-level checks target a single column in a table. They require a configuration block with field_name and table_name.
null_check
Fails if any null values exist in the specified column. Result is expressed as a percentage (0% nulls = pass).
- slug: users-email-not-null
name: Users Email Not Null
type: null_check
product: users-table
configuration:
field_name: email
table_name: analytics.users
uniqueness
Fails if duplicate values exist in the column. Measures the count of duplicates (0 = pass).
- slug: users-id-unique
name: Users ID Uniqueness
type: uniqueness
product: users-table
configuration:
field_name: id
table_name: analytics.users
type_check
Validates that values in a column conform to an expected data type or format.
- slug: orders-amount-numeric
name: Orders Amount Type Check
type: type_check
product: orders-table
configuration:
field_name: amount
table_name: analytics.orders
expected_type: numeric
range_check
Validates that numeric values fall within an expected range.
- slug: orders-amount-range
name: Orders Amount Range
type: range_check
product: orders-table
configuration:
field_name: amount
table_name: analytics.orders
min_value: 0
max_value: 1000000
pattern_check
Validates that string values match a regular expression pattern.
- slug: users-email-format
name: Email Format Validation
type: pattern_check
product: users-table
configuration:
field_name: email
table_name: analytics.users
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
enum_check
Validates that values belong to an allowed set.
- slug: orders-status-enum
name: Order Status Validation
type: enum_check
product: orders-table
configuration:
field_name: status
table_name: analytics.orders
allowed_values:
- pending
- confirmed
- shipped
- delivered
- cancelled
length_check
Validates that string lengths fall within expected bounds.
- slug: users-name-length
name: User Name Length
type: length_check
product: users-table
configuration:
field_name: display_name
table_name: analytics.users
min_length: 1
max_length: 255
freshness_check
Validates that the most recent timestamp in a column is not older than a threshold.
- slug: orders-freshness
name: Orders Table Freshness
type: freshness_check
product: orders-table
configuration:
field_name: updated_at
table_name: analytics.orders
max_age_hours: 24
Composite Checks
field_checks
Bundles multiple field-level assertions against a single table into one check. The platform executes all assertions in a single table scan for optimal performance.
- slug: users-field-suite
name: Users Field Quality Suite
type: field_checks
product: users-table
configuration:
table_name: analytics.users
checks:
- field: id
assertion: uniqueness
- field: email
assertion: not_null
- field: created_at
assertion: not_null
reconciliation
Compares results from two SQL queries across the same or different data sources. Supports tolerance-based comparison for floating point values.
- slug: revenue-reconciliation
name: Revenue Reconciliation
type: reconciliation
configuration:
source_query: "SELECT SUM(amount) FROM staging.revenue"
target_query: "SELECT SUM(amount) FROM prod.revenue"
comparison_mode: percentage # exact | percentage | absolute
tolerance: 0.01
Other Check Types
| Type | Description |
|---|---|
anomaly | Statistical anomaly detection on a metric time series |
custom | Fully custom check logic (typically used with external execution) |
manual | Human-entered value — prompts for manual input when triggered |
Thresholds
The thresholds object defines the pass/fail criteria for checks that produce a numeric value (primarily sql_metric). If omitted, the platform uses type-specific default evaluation logic.
| Key | Type | Description |
|---|---|---|
operator | string | Comparison operator |
value | number | Threshold value |
warn | number | Optional warning threshold (produces warning instead of fail) |
Supported Operators
| Operator | Meaning | Example |
|---|---|---|
eq | Equal to | Value must be exactly 100 |
gte | Greater than or equal | Value must be ≥ 1000 |
lte | Less than or equal | Value must be ≤ 5 |
gt | Greater than | Value must be > 0 |
lt | Less than | Value must be < 100 |
between | Within range | Requires min and max instead of value |
Example with Warning Threshold
thresholds:
operator: gte
value: 1000 # fail below this
warn: 5000 # warn below this but above 1000
In this example, a value of 800 produces a fail, 3000 produces a warning, and 6000 produces a pass.
Parameters
Parameterised queries let you define variables that are substituted at execution time. This is useful for date-partitioned checks or environment-specific values.
- slug: daily-row-count
name: Daily Row Count
type: sql_metric
query: "SELECT COUNT(*) FROM analytics.events WHERE event_date = '{{run_date}}'"
thresholds:
operator: gte
value: 10000
parameters:
- name: run_date
type: string
default: "2024-01-15"
description: "Target date partition"
Variables use double-brace syntax ({{variable_name}}) in the query and are resolved from the parameters list at runtime. You can override parameter values when triggering a run through the SDK or CLI.
Examples
Minimal Config
The simplest possible config file defines a single check:
version: "1.0"
space: acme-analytics
checks:
- slug: orders-exist
name: Orders Table Has Rows
type: sql_metric
query: "SELECT COUNT(*) FROM analytics.orders"
thresholds:
operator: gte
value: 1
Full-Featured Config
A production-grade config file with defaults, multiple check types, and shared connector:
version: "1.0"
space: acme-analytics
defaults:
connector: warehouse-snowflake
schedule: "0 6 * * *"
checks:
# SQL metric with thresholds
- slug: orders-row-count
name: Orders Row Count
type: sql_metric
description: "Ensure orders table is populated"
query: "SELECT COUNT(*) FROM analytics.orders"
product: orders-table
schedule: "0 8 * * *" # overrides the default 6 AM
thresholds:
operator: gte
value: 1000
warn: 500
# Field-level null check
- slug: users-no-null-emails
name: Users Email Null Check
type: null_check
product: users-table
configuration:
field_name: email
table_name: analytics.users
# Condition check — fails if any rows returned
- slug: no-orphaned-orders
name: No Orphaned Orders
type: sql_condition
query: >
SELECT o.id
FROM analytics.orders o
LEFT JOIN analytics.customers c ON o.customer_id = c.id
WHERE c.id IS NULL
# Cross-source reconciliation
- slug: revenue-reconciliation
name: Revenue Reconciliation
type: reconciliation
configuration:
source_query: "SELECT SUM(amount) FROM staging.revenue"
target_query: "SELECT SUM(amount) FROM prod.revenue"
comparison_mode: percentage
tolerance: 0.01
Multi-Product Config
You can define checks across multiple products in the same file, as long as they share the same space:
version: "1.0"
space: acme-analytics
defaults:
connector: warehouse-snowflake
checks:
- slug: customers-freshness
name: Customers Freshness
type: freshness_check
product: customers-table
schedule: "0 7 * * *"
configuration:
field_name: updated_at
table_name: analytics.customers
max_age_hours: 24
- slug: orders-freshness
name: Orders Freshness
type: freshness_check
product: orders-table
schedule: "0 8 * * *"
configuration:
field_name: created_at
table_name: analytics.orders
max_age_hours: 12
- slug: events-uniqueness
name: Events ID Uniqueness
type: uniqueness
product: events-stream
configuration:
field_name: event_id
table_name: analytics.events
Workflow
A typical workflow involves three steps: validate, apply, and run.
1. Validate
Check the file for structural errors and verify that referenced connectors and products exist on the platform:
qarion quality validate -f qarion-dq.yaml
This command parses the YAML, validates all field types, and checks that the target space, connectors, and products are resolvable. It does not create or modify anything.
2. Apply
Sync definitions to the platform — creates missing checks and updates existing ones:
qarion quality apply -f qarion-dq.yaml
The apply command is idempotent. Running it multiple times with the same config produces no changes. It matches checks by slug within the target space.
3. Run
Execute all checks defined in the config and record results:
qarion quality run-config -f qarion-dq.yaml
Use --no-record to execute checks without persisting results to the platform (useful for local testing):
qarion quality run-config -f qarion-dq.yaml --no-record
SDK Usage
The same workflow is available programmatically through the Python SDK:
from qarion import QarionSyncClient
from qarion.models.dq_config import DqConfig
# Parse the YAML file
config = DqConfig.from_yaml("qarion-dq.yaml")
client = QarionSyncClient(api_key="qk_...")
# Step 1: Validate
errors = client.quality.validate_config(config)
if errors:
for err in errors:
print(f"Error: {err}")
# Step 2: Apply (upsert)
summary = client.quality.apply_config(config)
print(summary) # {"created": [...], "updated": [...], "unchanged": [...]}
# Step 3: Run
results = client.quality.run_config(config)
for r in results:
print(f"{r.status}: {r.value}")
Use record_results=False to skip recording:
results = client.quality.run_config(config, record_results=False)
CI/CD Integration
GitHub Actions
Add a quality gate to your deployment pipeline that validates and runs checks after each push:
name: Data Quality Gate
on:
push:
branches: [main]
paths:
- "qarion-dq.yaml"
- "dbt/**"
jobs:
quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install qarion-cli
- name: Validate config
run: qarion quality validate -f qarion-dq.yaml
env:
QARION_API_KEY: ${{ secrets.QARION_API_KEY }}
- name: Apply check definitions
run: qarion quality apply -f qarion-dq.yaml
env:
QARION_API_KEY: ${{ secrets.QARION_API_KEY }}
- name: Run quality checks
run: qarion quality run-config -f qarion-dq.yaml
env:
QARION_API_KEY: ${{ secrets.QARION_API_KEY }}
Airflow / Orchestrator
Trigger a config-driven quality gate as a task in your pipeline:
from airflow.operators.python import PythonOperator
def run_quality_gate():
from qarion import QarionSyncClient
from qarion.models.dq_config import DqConfig
config = DqConfig.from_yaml("/opt/airflow/dags/qarion-dq.yaml")
client = QarionSyncClient(api_key="qk_...")
results = client.quality.run_config(config)
failed = [r for r in results if not r.is_passed]
if failed:
raise Exception(f"{len(failed)} quality check(s) failed")
quality_gate = PythonOperator(
task_id="quality_gate",
python_callable=run_quality_gate,
)
Related
- Quality Framework — Quality dimensions, severity, scheduling, and best practices
- Drift Detection Guide — Implement continuous monitoring for AI systems
- CLI Quality Commands — Full CLI command reference
- SDK Quality Resource — Python SDK method reference
- Quality Automation Tutorial — End-to-end programmatic setup guide