DQ Config (YAML)

Qarion supports configuration-driven data quality through YAML files. Rather than creating checks one-by-one through the UI or API, you define all your quality rules in a single version-controlled file, then use the CLI or SDK to sync them to the platform and execute them.

This approach brings several benefits:

Version control — Quality rules live alongside your data transformation code, so every change is tracked in Git with a full audit trail.
Code review — Rule changes go through pull requests, ensuring that thresholds, queries, and schedules are reviewed before deployment.
Reproducibility — A new environment can be bootstrapped by running qarion quality apply against the same config file.
CI/CD integration — Quality gates can be embedded directly into deployment pipelines.

File Format

A DQ config file is a YAML document with the following top-level keys:

Key	Type	Required	Description
`version`	string	no	Config schema version (default `"1.0"`)
`space`	string	yes	Target space slug — all checks will be created in this space
`defaults`	object	no	Default values inherited by every check in the file
`checks`	list	yes	One or more check definitions (minimum 1)

Defaults

The defaults block lets you set values that every check inherits unless explicitly overridden. This avoids repeating the same connector or schedule across dozens of checks.

Key	Type	Description
`connector`	string	Default connector slug for execution
`schedule`	string	Default cron schedule expression

Check Definition

Each entry in the checks list defines a single quality rule:

Key	Type	Required	API Mapping	Description
`slug`	string	yes	`slug`	Unique identifier within the space (URL-safe)
`name`	string	yes	`name`	Human-readable display name
`type`	string	yes	`check_type`	Check type (see Supported Types)
`description`	string	no	`description`	Explains the purpose of the check
`query`	string	no	`query`	SQL query for SQL-based check types
`connector`	string	no	—	Connector slug (overrides the default; resolved at apply time)
`product`	string	no	`product_slug`	Target data product slug to link the check to
`schedule`	string	no	`schedule_cron`	Cron expression (overrides the default)
`thresholds`	object	no	`threshold_config`	Pass/fail threshold configuration (see Thresholds)
`configuration`	object	no	`configuration`	Type-specific configuration (see per-type docs below)
`parameters`	list	no	`parameters`	Parameterized query variable definitions (see Parameters)

Supported Check Types

The type field accepts any of the following values, organized into logical groups.

SQL Checks

These checks execute arbitrary SQL against your data source.

`sql_metric`

Runs a query that returns a single numeric value, then evaluates it against a threshold. This is the most flexible check type for custom business rules.

- slug: orders-row-count
  name: Orders Row Count
  type: sql_metric
  query: "SELECT COUNT(*) FROM analytics.orders"
  product: orders-table
  thresholds:
    operator: gte
    value: 1000

`sql_condition`

Runs a query and fails if any rows are returned. Useful for asserting "this should never happen" conditions.

- slug: no-negative-amounts
  name: No Negative Order Amounts
  type: sql_condition
  query: "SELECT * FROM analytics.orders WHERE amount < 0"
  product: orders-table

Field-Level Checks

Field-level checks target a single column in a table. They require a configuration block with field_name and table_name.

`null_check`

Fails if any null values exist in the specified column. Result is expressed as a percentage (0% nulls = pass).

- slug: users-email-not-null
  name: Users Email Not Null
  type: null_check
  product: users-table
  configuration:
    field_name: email
    table_name: analytics.users

`uniqueness`

Fails if duplicate values exist in the column. Measures the count of duplicates (0 = pass).

- slug: users-id-unique
  name: Users ID Uniqueness
  type: uniqueness
  product: users-table
  configuration:
    field_name: id
    table_name: analytics.users

`type_check`

Validates that values in a column conform to an expected data type or format.

- slug: orders-amount-numeric
  name: Orders Amount Type Check
  type: type_check
  product: orders-table
  configuration:
    field_name: amount
    table_name: analytics.orders
    expected_type: numeric

`range_check`

Validates that numeric values fall within an expected range.

- slug: orders-amount-range
  name: Orders Amount Range
  type: range_check
  product: orders-table
  configuration:
    field_name: amount
    table_name: analytics.orders
    min_value: 0
    max_value: 1000000

`pattern_check`

Validates that string values match a regular expression pattern.

- slug: users-email-format
  name: Email Format Validation
  type: pattern_check
  product: users-table
  configuration:
    field_name: email
    table_name: analytics.users
    pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"

`enum_check`

Validates that values belong to an allowed set.

- slug: orders-status-enum
  name: Order Status Validation
  type: enum_check
  product: orders-table
  configuration:
    field_name: status
    table_name: analytics.orders
    allowed_values:
      - pending
      - confirmed
      - shipped
      - delivered
      - cancelled

`length_check`

Validates that string lengths fall within expected bounds.

- slug: users-name-length
  name: User Name Length
  type: length_check
  product: users-table
  configuration:
    field_name: display_name
    table_name: analytics.users
    min_length: 1
    max_length: 255

`freshness_check`

Validates that the most recent timestamp in a column is not older than a threshold.

- slug: orders-freshness
  name: Orders Table Freshness
  type: freshness_check
  product: orders-table
  configuration:
    field_name: updated_at
    table_name: analytics.orders
    max_age_hours: 24

Composite Checks

`field_checks`

Bundles multiple field-level assertions against a single table into one check. The platform executes all assertions in a single table scan for optimal performance.

- slug: users-field-suite
  name: Users Field Quality Suite
  type: field_checks
  product: users-table
  configuration:
    table_name: analytics.users
    checks:
      - field: id
        assertion: uniqueness
      - field: email
        assertion: not_null
      - field: created_at
        assertion: not_null

`reconciliation`

Compares results from two SQL queries across the same or different data sources. Supports tolerance-based comparison for floating point values.

- slug: revenue-reconciliation
  name: Revenue Reconciliation
  type: reconciliation
  configuration:
    source_query: "SELECT SUM(amount) FROM staging.revenue"
    target_query: "SELECT SUM(amount) FROM prod.revenue"
    comparison_mode: percentage  # exact | percentage | absolute
    tolerance: 0.01

Other Check Types

Type	Description
`anomaly`	Statistical anomaly detection on a metric time series
`custom`	Fully custom check logic (typically used with external execution)
`manual`	Human-entered value — prompts for manual input when triggered

Thresholds

The thresholds object defines the pass/fail criteria for checks that produce a numeric value (primarily sql_metric). If omitted, the platform uses type-specific default evaluation logic.

Key	Type	Description
`operator`	string	Comparison operator
`value`	number	Threshold value
`warn`	number	Optional warning threshold (produces `warning` instead of `fail`)

Supported Operators

Operator	Meaning	Example
`eq`	Equal to	Value must be exactly 100
`gte`	Greater than or equal	Value must be ≥ 1000
`lte`	Less than or equal	Value must be ≤ 5
`gt`	Greater than	Value must be > 0
`lt`	Less than	Value must be < 100
`between`	Within range	Requires `min` and `max` instead of `value`

Example with Warning Threshold

thresholds:
  operator: gte
  value: 1000   # fail below this
  warn: 5000    # warn below this but above 1000

In this example, a value of 800 produces a fail, 3000 produces a warning, and 6000 produces a pass.

Parameters

Parameterised queries let you define variables that are substituted at execution time. This is useful for date-partitioned checks or environment-specific values.

- slug: daily-row-count
  name: Daily Row Count
  type: sql_metric
  query: "SELECT COUNT(*) FROM analytics.events WHERE event_date = '{{run_date}}'"
  thresholds:
    operator: gte
    value: 10000
  parameters:
    - name: run_date
      type: string
      default: "2024-01-15"
      description: "Target date partition"

Variables use double-brace syntax ({{variable_name}}) in the query and are resolved from the parameters list at runtime. You can override parameter values when triggering a run through the SDK or CLI.

Examples

Minimal Config

The simplest possible config file defines a single check:

version: "1.0"
space: acme-analytics

checks:
  - slug: orders-exist
    name: Orders Table Has Rows
    type: sql_metric
    query: "SELECT COUNT(*) FROM analytics.orders"
    thresholds:
      operator: gte
      value: 1

Full-Featured Config

A production-grade config file with defaults, multiple check types, and shared connector:

version: "1.0"
space: acme-analytics

defaults:
  connector: warehouse-snowflake
  schedule: "0 6 * * *"

checks:
  # SQL metric with thresholds
  - slug: orders-row-count
    name: Orders Row Count
    type: sql_metric
    description: "Ensure orders table is populated"
    query: "SELECT COUNT(*) FROM analytics.orders"
    product: orders-table
    schedule: "0 8 * * *"  # overrides the default 6 AM
    thresholds:
      operator: gte
      value: 1000
      warn: 500

  # Field-level null check
  - slug: users-no-null-emails
    name: Users Email Null Check
    type: null_check
    product: users-table
    configuration:
      field_name: email
      table_name: analytics.users

  # Condition check — fails if any rows returned
  - slug: no-orphaned-orders
    name: No Orphaned Orders
    type: sql_condition
    query: >
      SELECT o.id
      FROM analytics.orders o
      LEFT JOIN analytics.customers c ON o.customer_id = c.id
      WHERE c.id IS NULL

  # Cross-source reconciliation
  - slug: revenue-reconciliation
    name: Revenue Reconciliation
    type: reconciliation
    configuration:
      source_query: "SELECT SUM(amount) FROM staging.revenue"
      target_query: "SELECT SUM(amount) FROM prod.revenue"
      comparison_mode: percentage
      tolerance: 0.01

Multi-Product Config

You can define checks across multiple products in the same file, as long as they share the same space:

version: "1.0"
space: acme-analytics

defaults:
  connector: warehouse-snowflake

checks:
  - slug: customers-freshness
    name: Customers Freshness
    type: freshness_check
    product: customers-table
    schedule: "0 7 * * *"
    configuration:
      field_name: updated_at
      table_name: analytics.customers
      max_age_hours: 24

  - slug: orders-freshness
    name: Orders Freshness
    type: freshness_check
    product: orders-table
    schedule: "0 8 * * *"
    configuration:
      field_name: created_at
      table_name: analytics.orders
      max_age_hours: 12

  - slug: events-uniqueness
    name: Events ID Uniqueness
    type: uniqueness
    product: events-stream
    configuration:
      field_name: event_id
      table_name: analytics.events

Workflow

A typical workflow involves three steps: validate, apply, and run.

1. Validate

Check the file for structural errors and verify that referenced connectors and products exist on the platform:

qarion quality validate -f qarion-dq.yaml

This command parses the YAML, validates all field types, and checks that the target space, connectors, and products are resolvable. It does not create or modify anything.

2. Apply

Sync definitions to the platform — creates missing checks and updates existing ones:

qarion quality apply -f qarion-dq.yaml

The apply command is idempotent. Running it multiple times with the same config produces no changes. It matches checks by slug within the target space.

3. Run

Execute all checks defined in the config and record results:

qarion quality run-config -f qarion-dq.yaml

Use --no-record to execute checks without persisting results to the platform (useful for local testing):

qarion quality run-config -f qarion-dq.yaml --no-record

SDK Usage

The same workflow is available programmatically through the Python SDK:

from qarion import QarionSyncClient
from qarion.models.dq_config import DqConfig

# Parse the YAML file
config = DqConfig.from_yaml("qarion-dq.yaml")

client = QarionSyncClient(api_key="qk_...")

# Step 1: Validate
errors = client.quality.validate_config(config)
if errors:
    for err in errors:
        print(f"Error: {err}")

# Step 2: Apply (upsert)
summary = client.quality.apply_config(config)
print(summary)  # {"created": [...], "updated": [...], "unchanged": [...]}

# Step 3: Run
results = client.quality.run_config(config)
for r in results:
    print(f"{r.status}: {r.value}")

Use record_results=False to skip recording:

results = client.quality.run_config(config, record_results=False)

CI/CD Integration

GitHub Actions

Add a quality gate to your deployment pipeline that validates and runs checks after each push:

name: Data Quality Gate

on:
  push:
    branches: [main]
    paths:
      - "qarion-dq.yaml"
      - "dbt/**"

jobs:
  quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - run: pip install qarion-cli

      - name: Validate config
        run: qarion quality validate -f qarion-dq.yaml
        env:
          QARION_API_KEY: ${{ secrets.QARION_API_KEY }}

      - name: Apply check definitions
        run: qarion quality apply -f qarion-dq.yaml
        env:
          QARION_API_KEY: ${{ secrets.QARION_API_KEY }}

      - name: Run quality checks
        run: qarion quality run-config -f qarion-dq.yaml
        env:
          QARION_API_KEY: ${{ secrets.QARION_API_KEY }}

Airflow / Orchestrator

Trigger a config-driven quality gate as a task in your pipeline:

from airflow.operators.python import PythonOperator

def run_quality_gate():
    from qarion import QarionSyncClient
    from qarion.models.dq_config import DqConfig

    config = DqConfig.from_yaml("/opt/airflow/dags/qarion-dq.yaml")
    client = QarionSyncClient(api_key="qk_...")

    results = client.quality.run_config(config)
    failed = [r for r in results if not r.is_passed]
    if failed:
        raise Exception(f"{len(failed)} quality check(s) failed")

quality_gate = PythonOperator(
    task_id="quality_gate",
    python_callable=run_quality_gate,
)

Quality Framework — Quality dimensions, severity, scheduling, and best practices
Drift Detection Guide — Implement continuous monitoring for AI systems
CLI Quality Commands — Full CLI command reference
SDK Quality Resource — Python SDK method reference
Quality Automation Tutorial — End-to-end programmatic setup guide

File Format​

Defaults​

Check Definition​

Supported Check Types​

SQL Checks​

sql_metric​

sql_condition​

Field-Level Checks​

null_check​

uniqueness​

type_check​

range_check​

pattern_check​

enum_check​

length_check​

freshness_check​

Composite Checks​

field_checks​

reconciliation​

Other Check Types​

Thresholds​

Supported Operators​

Example with Warning Threshold​

Parameters​

Examples​

Minimal Config​

Full-Featured Config​

Multi-Product Config​

Workflow​

1. Validate​

2. Apply​

3. Run​

SDK Usage​

CI/CD Integration​

GitHub Actions​

Airflow / Orchestrator​

Related​