Lineage & Impact
Data lineage is the ability to trace how data flows between products — from raw sources through transformations to final outputs. In Qarion, lineage relationships are first-class entities that you can define manually, import automatically from tools like dbt, or manage programmatically through the API.
What is Lineage?
At its simplest, lineage describes the upstream and downstream relationships between data products:
Raw Events → Cleaned Events → Event Metrics → Dashboard
(upstream) (downstream)
Upstream products are the sources that feed data into a product. Downstream products are the consumers that depend on it. By recording these relationships, Qarion builds a directed graph that can be traversed in either direction, giving you a complete picture of how data moves through your organization.
Why Lineage Matters
Impact Assessment
Before making a change to a data product — whether it's a schema modification, a pipeline refactor, or a data migration — you need to know what will be affected downstream. Lineage lets you answer questions like "If I change this table's schema, which dashboards will break?" before you deploy, rather than discovering the answer when something fails in production.
Root Cause Analysis
When something goes wrong — a dashboard shows incorrect numbers, a metric stops updating, or a downstream consumer receives unexpected nulls — lineage allows you to trace the problem back to its source. Instead of manually investigating each step in the pipeline, you can walk the lineage graph upstream to pinpoint where the bad data originated.
Compliance and Auditing
Regulatory requirements often demand that organizations can demonstrate where their data comes from and where it goes. Lineage provides the provenance trail needed to answer audit questions like "Show me everywhere customer PII flows" — a question that would otherwise require extensive manual investigation across teams and systems.
Lineage Relationships
Qarion supports several relationship types to capture the nature of the dependency between products:
| Type | Description |
|---|---|
transforms | Source data is transformed or aggregated |
joins | Data is joined with this source |
consumes | Data is read but unchanged |
derives | Calculated or derived from |
These types are informational rather than functional — the platform treats all lineage edges equally when computing impact or rendering graphs — but they provide valuable context when a developer is trying to understand how data flows between products, not just that it flows.
Here is an example showing how a single raw source can feed multiple downstream products through different relationship types:
customer_events (raw)
├── transforms → customer_events_clean
│ ├── joins → customer_dim
│ └── transforms → customer_metrics
│ └── consumes → executive_dashboard
│
└── consumes → event_monitoring_dashboard
Lineage Graph
Product-Level Lineage
To view the upstream and downstream dependencies of a single product, use the product lineage endpoint:
GET /catalog/spaces/{slug}/products/{id}/lineage?depth=2
The depth parameter controls how many hops from the focal product to include (defaulting to 1), and the direction parameter lets you request only upstream, only downstream, or both.
Global Lineage
For a broader view that spans the entire graph, use the global lineage endpoint with a selector expression to define your focus:
GET /lineage/graph?selector=customer_metrics+
This query retrieves customer_metrics and all of its downstream consumers. The selector syntax is described in detail below.
dbt-Style Selectors
Qarion supports a selector syntax inspired by dbt's graph operators, providing a concise way to express which portion of the lineage graph you want to retrieve:
| Selector | Meaning |
|---|---|
model | Single product |
+model | All upstream + product |
model+ | Product + all downstream |
+model+ | Full lineage chain |
model+2 | Product + 2 levels downstream |
2+model | 2 levels upstream + product |
The + operator indicates direction (prefix for upstream, suffix for downstream), and an optional numeric modifier limits the depth of traversal. These selectors compose naturally — 3+customer_metrics+3 retrieves three levels of upstream dependencies, the focal product, and three levels of downstream consumers.
# Everything that feeds into customer_metrics
curl "/lineage/graph?selector=+customer_metrics"
# customer_metrics and all its consumers
curl "/lineage/graph?selector=customer_metrics+"
# Full lineage chain, 3 levels each direction
curl "/lineage/graph?selector=3+customer_metrics+3"
Impact Analysis
Before a Change
The impact analysis endpoint quantifies the blast radius of a potential change to a data product. Given a product ID, it returns the number of affected products, dashboards, contracts, and quality checks, along with a list of stakeholders who should be notified:
GET /lineage/impact?product_id={id}&depth=5
{
"affected_products": 12,
"affected_dashboards": 3,
"affected_contracts": 2,
"affected_checks": 8,
"stakeholders": [
{"name": "Alice (Owner - Customer Metrics)", "email": "..."},
{"name": "Bob (Steward - Revenue Dashboard)", "email": "..."}
]
}
Stakeholder Notification
You can combine impact analysis with programmatic notification to ensure that everyone affected by a change knows about it before it happens:
def notify_before_change(product_id):
impact = client.get(f"/lineage/impact?product_id={product_id}").json()
for stakeholder in impact["stakeholders"]:
send_email(
to=stakeholder["email"],
subject=f"Planned change to {product_name}",
body=f"This may affect {len(impact['affected_products'])} products..."
)
This pattern is especially valuable in organizations where changes to shared datasets can have cascading consequences across teams.
Setting Up Lineage
Manual Definition
The most straightforward way to define lineage is by specifying upstream and downstream product IDs directly:
PUT /catalog/spaces/{slug}/products/{id}/lineage
{
"upstream_ids": ["source-1-uuid", "source-2-uuid"],
"downstream_ids": ["consumer-uuid"]
}
This approach works well for small catalogs or for relationships that aren't captured by any automated tool.
Automatic from dbt
For organizations using dbt, lineage can be extracted automatically from the depends_on field in the dbt manifest. When you sync your dbt project with Qarion, the platform reads the dependency graph and creates the corresponding lineage relationships:
# In dbt manifest
"depends_on": {
"nodes": [
"model.project.customer_events",
"model.project.customer_dim"
]
}
This is the recommended approach for dbt-based pipelines, since it keeps lineage in sync with your actual transformation logic without any manual intervention. See the dbt Sync Tutorial for a complete walkthrough.
API-Based Discovery
For custom pipelines that don't use dbt, you can update lineage programmatically after each job run. This approach treats your pipeline orchestrator as the source of truth for lineage:
def update_lineage_after_job(job_config):
for output_table in job_config["outputs"]:
for input_table in job_config["inputs"]:
# Add upstream relationship
client.post(
f"/catalog/spaces/{space}/products/{output_table}/lineage/upstream",
json={"product_id": input_table}
)
Lineage Visualization
Graph Response Format
The lineage graph API returns a structured response with nodes (the products in the graph) and edges (the relationships between them). Each node includes a layer value indicating its distance from the focal product — zero for the focal product itself, negative values for upstream dependencies, and positive values for downstream consumers:
{
"nodes": [
{"id": "...", "name": "customer_events", "type": "table", "layer": -2},
{"id": "...", "name": "customer_metrics", "type": "table", "layer": 0},
{"id": "...", "name": "dashboard", "type": "dashboard", "layer": 1}
],
"edges": [
{"source": "customer_events", "target": "customer_metrics", "relationship": "transforms"},
{"source": "customer_metrics", "target": "dashboard", "relationship": "consumes"}
]
}
This format is designed to be easy to render as a graph visualization, with the layer values providing a natural left-to-right ordering.
Use Cases
Change Management
Before deploying schema changes to a data product, query the impact analysis endpoint to identify affected downstream assets, notify the stakeholders listed in the response, verify that downstream products are compatible with the planned change, and deploy with confidence that all affected parties are aware.
Incident Response
When a data quality issue is detected, use the lineage graph to check upstream dependencies and identify the root cause. Then trace all affected downstream products to understand the full scope of impact, and coordinate resolution across the teams responsible for each affected product.
Compliance Audit
For regulatory requirements, start by identifying data products tagged with sensitive classifications (such as PII), trace all downstream consumers to see where sensitive data flows, verify that access controls are appropriate at each step, and generate a lineage report that demonstrates data provenance.
Documentation
The lineage graph itself serves as living documentation of your data architecture. By exporting it and rendering it as a visualization, you can include data flow diagrams in product documentation that always reflect the actual state of your pipelines — rather than maintaining static diagrams that drift out of date.
Best Practices
Keep Lineage Current
Stale lineage is worse than no lineage at all, because it creates a false sense of security. Whenever pipelines change — whether through code modifications, new data sources, or retired outputs — update the lineage graph to reflect the new reality.
Use Automation
Manual lineage updates are error-prone and easily forgotten. Wherever possible, automate lineage management by syncing from dbt automatically, updating lineage via CI/CD hooks when pipeline code changes, or using pipeline metadata from your orchestration tool to infer dependencies.
Tag Sensitive Data
Lineage becomes especially powerful when combined with data classification tags. By tagging products that contain PII, confidential data, or sensitive business information, you can use lineage traversal to track how sensitive data propagates through your organization — and ensure that access controls and compliance measures are applied consistently at every stage.
Audit Regularly
Even with automation, lineage graphs can develop inconsistencies over time. Schedule periodic audits to identify orphaned products (those with no upstream sources, which may indicate missing lineage), dead ends (products with no downstream consumers, which may be obsolete), and circular dependencies (which usually indicate a modeling error).
Related
- Lineage API — Endpoint reference
- dbt Sync — Automatic lineage import
- Data Model — Entity relationships