Continuous Monitoring & Drift Detection

Qarion's Quality Engine supports continuous monitoring of AI systems and data pipelines through scheduled checks, threshold-based alerting, and automated escalation. This page explains how to configure monitoring for the most common drift and degradation scenarios.

Why Continuous Monitoring?

AI systems and data products degrade silently. A model trained on last year's customer data may lose accuracy as buying patterns shift. A feature pipeline may start producing values outside the distribution the model expects. Without proactive monitoring, these problems are only discovered when business metrics decline — often weeks after the root cause occurred.

Qarion addresses this with the same quality check infrastructure used for traditional data quality, extended to cover AI-specific concerns: data drift, model drift, concept drift, and performance degradation.

Types of Drift

Data Drift

Data drift occurs when the statistical properties of input data change over time. This includes distribution shifts in feature values, unexpected changes in data volume, schema changes or new categories appearing in categorical fields, and changes in null rates or data completeness.

Impact: Even if a model hasn't changed, its predictions become unreliable when the input data no longer resembles the training data.

Model Drift

Model drift (also called model decay) occurs when a model's predictive performance degrades over time. Key indicators include declining accuracy, precision, or recall, increasing prediction error rates, and a growing gap between predicted and actual outcomes.

Impact: Business decisions based on model outputs become progressively less reliable.

Concept Drift

Concept drift occurs when the underlying relationship between inputs and outputs changes. For example, customer churn patterns may shift during an economic downturn, fraud signatures may evolve as attackers adapt, or seasonal patterns may change due to external factors.

Impact: Even with stable data distributions, the model's learned patterns no longer reflect reality.

Performance Degradation

Performance degradation refers to operational issues: increasing inference latency, rising error rates, memory consumption growth, or throughput decline.

Impact: The system becomes unreliable or unusable regardless of prediction quality.

Monitoring with Quality Checks

Each drift type maps directly to existing check types in Qarion.

Data Drift Monitoring

Use SQL Metric checks to track feature distributions and detect shifts:

What to Monitor	Check Type	Example
Volume changes	SQL Metric	Row count with min/max thresholds
Null rate increase	Null Check	Max null percentage per column
Value distribution shift	SQL Metric	Mean, stddev, or percentile queries
Category changes	SQL Condition	New values not in expected set
Schema freshness	Freshness Check	Timestamp age on feature tables

Example: Feature distribution monitoring

Create a SQL Metric check that tracks the mean of a critical feature. Set thresholds based on the expected range from training data:

Navigate to the product's Quality tab
Click Add Quality Check
Select SQL Metric as the check type
Enter a query like SELECT AVG(purchase_amount) FROM features.customer_transactions
Set thresholds: warning at ±1 standard deviation, critical at ±2 standard deviations
Schedule to run hourly or daily depending on data freshness

Model Drift Monitoring

Model metrics (accuracy, F1 score, AUC) are typically computed outside the platform — in your ML pipeline, evaluation job, or inference service. Push these metrics into Qarion using the External Metric push API:

Create a Custom check for each metric (e.g., "Model Accuracy", "F1 Score")
From your ML pipeline, push results after each evaluation run
Set thresholds that reflect acceptable performance bounds

Pipeline Integration

Use the SDK or CLI to push metrics directly from your training and evaluation pipelines. See the developer guide for code examples.

Concept Drift Monitoring

Concept drift is typically detected by monitoring the relationship between predictions and actual outcomes:

Create a SQL Metric check that computes prediction-vs-actual error rates
Use a query like SELECT AVG(ABS(predicted - actual)) FROM model_predictions WHERE prediction_date >= CURRENT_DATE - INTERVAL '7 days'
Set thresholds based on baseline error rates from your validation set
Schedule to run daily or weekly

Performance Monitoring

Track operational metrics by pushing them from your inference service:

Create Custom checks for latency (p50, p95, p99), error rate, and throughput
Push metrics from your monitoring stack (Prometheus, Datadog, CloudWatch)
Set thresholds based on your SLA requirements

Alert Configuration for Drift

Severity Mapping

Map drift severity to check severity to ensure appropriate response:

Scenario	Severity	Rationale
Feature mean drifts by 1σ	Warning	May self-correct; worth watching
Feature mean drifts by 2σ	Critical	Likely real drift; investigate
Model accuracy drops below SLA	Critical	Direct business impact
Null rate exceeds 5%	Warning	Data quality issue; may affect model
Inference latency exceeds SLA	Critical	User-facing degradation
Row count drops 20%	Warning	Pipeline may have partial failure

Alert Response Workflow

When a drift alert fires:

Alert Center surfaces the alert with severity and context
Acknowledge the alert to signal you're investigating
Investigate using the execution history and linked product details
If the issue requires formal tracking, Create a Ticket from the alert
If the drift indicates the model or risk assessment is stale, trigger a Risk Re-assessment via the Risk Assessment module

Re-assessment Triggers

Significant drift events should prompt a review of the product's risk classification. Navigate to the product detail page and create a new risk assessment under the Risk Assessments section when drift exceeds predefined thresholds.

Scheduling Recommendations

Monitoring Type	Recommended Schedule	Rationale
Data volume checks	Hourly (`0 * * * *`)	Catch pipeline failures quickly
Feature distribution	Daily (`0 6 * * *`)	Balance cost vs detection speed
Model accuracy	After each evaluation	Tied to retraining cadence
Concept drift	Weekly (`0 0 * * 1`)	Requires accumulated outcomes
Performance metrics	Every 15 min (`/15 * * *`)	Real-time SLA monitoring

Multi-Product Monitoring

For AI systems with multiple input datasets, use Qarion's multi-product association feature to link a single drift check to all relevant data products. When:

A feature store serves multiple models, create one distribution check and link it to all consuming products
A shared pipeline feeds multiple downstream tables, create one freshness check covering all of them
Multiple models share training data, a single drift check on the training set protects all downstream models

Best Practices

Establish Baselines First

Before enabling drift alerts, run checks manually for 1–2 weeks to establish a baseline. Use the trend charts to understand normal variance, then set thresholds that avoid false positives while still catching genuine drift.

Layer Your Monitoring

Start with the highest-impact, easiest-to-implement checks:

Tier 1: Freshness + Volume          (catch pipeline failures)
Tier 2: Null rates + Distribution   (catch data drift)
Tier 3: Model metrics               (catch model drift)
Tier 4: Prediction-vs-actual        (catch concept drift)

Use Version-Controlled Config

Define all monitoring checks in a YAML config file and manage them through Git. This ensures monitoring rules are reviewed, versioned, and reproducible across environments.

Connect to Risk Assessment

When continuous monitoring detects sustained drift, use it as a trigger for formal risk re-assessment. This closes the loop between automated detection and governance review, ensuring that risk classifications stay current as systems evolve.

Learn More

Quality Checks — Creating and configuring quality rules
Alerts Center — Monitoring and responding to alerts
Quality Management Overview — Platform quality capabilities

Why Continuous Monitoring?​

Types of Drift​

Data Drift​

Model Drift​

Concept Drift​

Performance Degradation​

Monitoring with Quality Checks​

Data Drift Monitoring​

Model Drift Monitoring​

Concept Drift Monitoring​

Performance Monitoring​

Alert Configuration for Drift​

Severity Mapping​

Alert Response Workflow​

Scheduling Recommendations​

Multi-Product Monitoring​

Best Practices​

Establish Baselines First​

Layer Your Monitoring​

Use Version-Controlled Config​

Connect to Risk Assessment​

Learn More​