Continuous Monitoring & Drift Detection
Qarion's Quality Engine supports continuous monitoring of AI systems and data pipelines through scheduled checks, threshold-based alerting, and automated escalation. This page explains how to configure monitoring for the most common drift and degradation scenarios.
Why Continuous Monitoring?
AI systems and data products degrade silently. A model trained on last year's customer data may lose accuracy as buying patterns shift. A feature pipeline may start producing values outside the distribution the model expects. Without proactive monitoring, these problems are only discovered when business metrics decline — often weeks after the root cause occurred.
Qarion addresses this with the same quality check infrastructure used for traditional data quality, extended to cover AI-specific concerns: data drift, model drift, concept drift, and performance degradation.
Types of Drift
Data Drift
Data drift occurs when the statistical properties of input data change over time. This includes distribution shifts in feature values, unexpected changes in data volume, schema changes or new categories appearing in categorical fields, and changes in null rates or data completeness.
Impact: Even if a model hasn't changed, its predictions become unreliable when the input data no longer resembles the training data.
Model Drift
Model drift (also called model decay) occurs when a model's predictive performance degrades over time. Key indicators include declining accuracy, precision, or recall, increasing prediction error rates, and a growing gap between predicted and actual outcomes.
Impact: Business decisions based on model outputs become progressively less reliable.
Concept Drift
Concept drift occurs when the underlying relationship between inputs and outputs changes. For example, customer churn patterns may shift during an economic downturn, fraud signatures may evolve as attackers adapt, or seasonal patterns may change due to external factors.
Impact: Even with stable data distributions, the model's learned patterns no longer reflect reality.
Performance Degradation
Performance degradation refers to operational issues: increasing inference latency, rising error rates, memory consumption growth, or throughput decline.
Impact: The system becomes unreliable or unusable regardless of prediction quality.
Monitoring with Quality Checks
Each drift type maps directly to existing check types in Qarion.
Data Drift Monitoring
Use SQL Metric checks to track feature distributions and detect shifts:
| What to Monitor | Check Type | Example |
|---|---|---|
| Volume changes | SQL Metric | Row count with min/max thresholds |
| Null rate increase | Null Check | Max null percentage per column |
| Value distribution shift | SQL Metric | Mean, stddev, or percentile queries |
| Category changes | SQL Condition | New values not in expected set |
| Schema freshness | Freshness Check | Timestamp age on feature tables |
Example: Feature distribution monitoring
Create a SQL Metric check that tracks the mean of a critical feature. Set thresholds based on the expected range from training data:
- Navigate to the product's Quality tab
- Click Add Quality Check
- Select SQL Metric as the check type
- Enter a query like
SELECT AVG(purchase_amount) FROM features.customer_transactions - Set thresholds: warning at ±1 standard deviation, critical at ±2 standard deviations
- Schedule to run hourly or daily depending on data freshness
Model Drift Monitoring
Model metrics (accuracy, F1 score, AUC) are typically computed outside the platform — in your ML pipeline, evaluation job, or inference service. Push these metrics into Qarion using the External Metric push API:
- Create a Custom check for each metric (e.g., "Model Accuracy", "F1 Score")
- From your ML pipeline, push results after each evaluation run
- Set thresholds that reflect acceptable performance bounds
Use the SDK or CLI to push metrics directly from your training and evaluation pipelines. See the developer guide for code examples.
Concept Drift Monitoring
Concept drift is typically detected by monitoring the relationship between predictions and actual outcomes:
- Create a SQL Metric check that computes prediction-vs-actual error rates
- Use a query like
SELECT AVG(ABS(predicted - actual)) FROM model_predictions WHERE prediction_date >= CURRENT_DATE - INTERVAL '7 days' - Set thresholds based on baseline error rates from your validation set
- Schedule to run daily or weekly
Performance Monitoring
Track operational metrics by pushing them from your inference service:
- Create Custom checks for latency (p50, p95, p99), error rate, and throughput
- Push metrics from your monitoring stack (Prometheus, Datadog, CloudWatch)
- Set thresholds based on your SLA requirements
Alert Configuration for Drift
Severity Mapping
Map drift severity to check severity to ensure appropriate response:
| Scenario | Severity | Rationale |
|---|---|---|
| Feature mean drifts by 1σ | Warning | May self-correct; worth watching |
| Feature mean drifts by 2σ | Critical | Likely real drift; investigate |
| Model accuracy drops below SLA | Critical | Direct business impact |
| Null rate exceeds 5% | Warning | Data quality issue; may affect model |
| Inference latency exceeds SLA | Critical | User-facing degradation |
| Row count drops 20% | Warning | Pipeline may have partial failure |
Alert Response Workflow
When a drift alert fires:
- Alert Center surfaces the alert with severity and context
- Acknowledge the alert to signal you're investigating
- Investigate using the execution history and linked product details
- If the issue requires formal tracking, Create a Ticket from the alert
- If the drift indicates the model or risk assessment is stale, trigger a Risk Re-assessment via the Risk Assessment module
Significant drift events should prompt a review of the product's risk classification. Navigate to the product detail page and create a new risk assessment under the Risk Assessments section when drift exceeds predefined thresholds.
Scheduling Recommendations
| Monitoring Type | Recommended Schedule | Rationale |
|---|---|---|
| Data volume checks | Hourly (0 * * * *) | Catch pipeline failures quickly |
| Feature distribution | Daily (0 6 * * *) | Balance cost vs detection speed |
| Model accuracy | After each evaluation | Tied to retraining cadence |
| Concept drift | Weekly (0 0 * * 1) | Requires accumulated outcomes |
| Performance metrics | Every 15 min (*/15 * * * *) | Real-time SLA monitoring |
Multi-Product Monitoring
For AI systems with multiple input datasets, use Qarion's multi-product association feature to link a single drift check to all relevant data products. When:
- A feature store serves multiple models, create one distribution check and link it to all consuming products
- A shared pipeline feeds multiple downstream tables, create one freshness check covering all of them
- Multiple models share training data, a single drift check on the training set protects all downstream models
Best Practices
Establish Baselines First
Before enabling drift alerts, run checks manually for 1–2 weeks to establish a baseline. Use the trend charts to understand normal variance, then set thresholds that avoid false positives while still catching genuine drift.
Layer Your Monitoring
Start with the highest-impact, easiest-to-implement checks:
Tier 1: Freshness + Volume (catch pipeline failures)
Tier 2: Null rates + Distribution (catch data drift)
Tier 3: Model metrics (catch model drift)
Tier 4: Prediction-vs-actual (catch concept drift)
Use Version-Controlled Config
Define all monitoring checks in a YAML config file and manage them through Git. This ensures monitoring rules are reviewed, versioned, and reproducible across environments.
Connect to Risk Assessment
When continuous monitoring detects sustained drift, use it as a trigger for formal risk re-assessment. This closes the loop between automated detection and governance review, ensuring that risk classifications stay current as systems evolve.
Learn More
- Quality Checks — Creating and configuring quality rules
- Alerts Center — Monitoring and responding to alerts
- Quality Management Overview — Platform quality capabilities