Skip to main content

Data Lineage Overview

Data lineage shows how data flows through your organization—where it comes from, how it transforms, and where it goes. Understanding lineage is essential for impact analysis, debugging, and governance.

What is Data Lineage?

Data lineage traces the journey of data from source to consumption. It shows Upstream sources (where data comes from) and Downstream consumers (who uses the data). This information helps answer critical questions like "If I change this table, what will break?", "Where did this data originally come from?", and "Who should I notify about a data quality issue?".

Lineage in Qarion

Qarion provides two complementary views of data lineage:

Product-Specific Lineage

Found in the Lineage tab of any data product, this view centers on a single product and shows its direct dependencies. It offers a focused, detailed view suitable for dependency curation and depth-adjustable exploration.

Global Lineage

The Lineage page in the main navigation shows the entire space's data flow in one comprehensive graph. This "bird's-eye view" reveals pipeline structures and helps identify patterns across the organization.

Key Concepts

Upstream Sources

Upstream sources are products that provide data to the current product. For example, a customer_orders table might have customers and orders as upstream sources, or a dashboard might consume data from multiple upstream tables.

Downstream Consumers

Downstream consumers are products that depend on the current product. For instance, a daily_sales table might be upstream of a sales_dashboard, or multiple reports might consume the same source table.

Dependency Depth

Lineage isn't limited to direct connections. You can explore Depth 1 (immediate dependencies), Depth 2 (dependencies of dependencies), and Depth 3+ for full pipeline tracing.

Lineage Data Model

Lineage relationships are stored as metadata on each product:

{
"upstream": ["product-uuid-1", "product-uuid-2"],
"downstream": ["product-uuid-3"]
}

Bidirectional Synchronization

When you add Product A as upstream of Product B, B's lineage automatically shows A as upstream, and A's lineage automatically shows B as downstream. This ensures consistency regardless of which product you're viewing.

Lineage Visualization

Graph Layout

Qarion uses a hierarchical layout inspired by dbt documentation. Data flows from left-to-right (sources to consumers), with automatic node positioning calculated by the Dagre algorithm. Edges are color-coded, with upstream links in blue and downstream links in green.

Node Icons

Products display type-specific icons, such as 📊 Table, 👁️ View, 📈 Dashboard, and 🔌 Stream, among others configured in your space.

Interactive Features

The graph supports pan and zoom for navigating large structures, node dragging for manual repositioning, and click navigation to open detail pages. A minimap provides an overview for orientation in complex graphs.

Use Cases

Impact Analysis

Before modifying a data product, open its Lineage tab and increase the downstream depth. This allows you to identify all consumers that might be affected so you can notify stakeholders before making changes.

Root Cause Investigation

When a data quality issue appears, navigate to the affected product and trace upstream to find potential source issues. Checking the quality status of upstream dependencies helps identify where the problem originated.

Data Discovery

When exploring unfamiliar data, use the Global Lineage view to locate key data products. This helps you understand how they fit into the broader pipeline and identify related products for deeper exploration.

AI Pipeline Traceability

For AI Systems, lineage is essential for regulatory compliance. Use the lineage graph to trace exactly where training data originates, how it is transformed through feature engineering, and which models consume it. This end-to-end provenance supports EU AI Act requirements for traceability of results (Articles 12–14) and helps assessors understand the full data supply chain behind an AI system. When upstream data quality degrades, lineage-based impact analysis identifies which AI models are at risk.

Getting Started

To learn more, check out Product Lineage for details on the product-level tab, or Global Lineage to explore the space-wide lineage view.