Onboarding a New Data Product
This walkthrough guides you through registering a new data product in the catalog and setting up proper governance.
Scenario
You have a new dataset that needs to be added to the data catalog. Your goal is to register it properly with complete metadata, assign governance roles, and set up quality monitoring.
Before You Begin
Before starting the registration process, ensure you have the necessary details ready. You should know the product name and description, its physical location (schema and table), the business domain it belongs to, the individuals who will fulfill governance roles (Owner, Steward, Custodian, Data Architect), and the quality checks required to ensure its reliability.
Step 1: Create the Product
Navigate to Data Catalog
To begin, open your space's Data Catalog and click Add Product (or use the Quick Actions menu). Select the appropriate product type, such as Table, Dashboard, or API.
Basic Information
Fill in the essential details. Enter a technical or business Name and a Description that explains what the data represents and how it is used. Specify the Domain (e.g., Sales, Finance) and add relevant Tags to aid in discovery.
Write the description for someone unfamiliar with the data. Include what the data represents, why it exists, and its key use cases.
If you are registering a machine learning model or AI pipeline, select AI System as the product type. This unlocks AI-specific features including Risk Classification (aligned with the EU AI Act), continuous drift monitoring for training data and model outputs, and enhanced lineage tracing for regulatory traceability.
Step 2: Configure Source Location
Link to Source System
Select the Source System (e.g., Snowflake, BigQuery) and specify the Physical Location by entering the Database/Project, Schema/Dataset, and the specific Table or View name.
Enable Schema Import
If the source system is connected, click Sync Schema to automatically import fields. Review the imported metadata and add descriptions to key columns to enhance understanding.
Step 3: Document the Schema
Important Fields
For each field, document its Description to explain its meaning, identify any Structural Indicators like Primary or Foreign Keys, and note any Business Rules regarding allowed values or formats.
Field Indicators
Mark fields with appropriate indicators: PK for Primary Keys, FK for Foreign Keys linking to other products, PII for Personally Identifiable Information, and Nullable for fields that can be empty.
Step 4: Assign Governance Roles
Role Assignments
Navigate to the Governance tab to assign responsibilities. The Owner (typically a Product Manager) makes strategic decisions. The Steward (often a Data Analyst) handles day-to-day curation and access approvals. The Custodian (usually a Data Engineer) is responsible for technical maintenance and quality checks.
Every product should have at least one Steward for access approval.
Step 5: Set Up Quality Checks
Essential Checks
Establish baseline checks to ensure data health. Common checks include Freshness to verify timely updates, Row Count to monitor volume, Uniqueness to prevent duplicates in primary keys, and Not Null checks for required fields.
Create a Check
To add a check, go to the Quality tab and click Add Check. Select the check type and configure parameters such as the Name, the Threshold for pass/fail criteria, and the Schedule for execution.
Custom SQL Checks
For complex validations, you can write custom SQL, such as verifying that no dates are in the future:
-- Example: Verify no future dates
SELECT COUNT(*) as failures
FROM my_table
WHERE event_date > CURRENT_DATE
Step 6: Configure Access
Default Access Levels
Go to the Access tab to configure default visibility and access permissions. You can specify who can access the product by default and set up any role requirements.
Link to Source System Roles
If access is managed via the source system, link the Qarion product to the relevant source system roles and document which role grants the specific access rights.
Step 7: Set Up Lineage
Automatic Lineage
If you are using connected source systems, lineage may be auto-detected from transformation logs. Always verify that these connections accurately reflect reality.
Manual Lineage
For relationships not automatically discovered, go to the Lineage tab. Click Add Upstream or Add Downstream to search for and select related products to manually build the lineage graph.
Step 8: Review and Publish
Quality Checklist
Before completing registration, verify that the description is clear, schema fields are documented, governance roles are assigned, quality checks are configured, lineage is captured, and appropriate tags are applied.
Notify Stakeholders
Once published, consider notifying downstream consumers, related product stewards, and your space team to ensure everyone is aware of the new data product.
Post-Onboarding
First Week
Monitor initial quality check runs to address any failures early, and collect feedback on the documentation to ensure it meets user needs.
Ongoing
Keep descriptions current as business needs evolve, update checks as data changes, and review access requests promptly to maintain smooth operations.
Key Takeaways
Complete metadata enables discovery and trust, while governance roles ensure accountability. Quality checks provide early warnings of issues, and lineage helps understand impact across the data ecosystem.