BigQuery Integration
Connect Qarion to Google BigQuery to sync metadata, run quality checks, and discover lineage.
Prerequisites
Before connecting, ensure you have a Google Cloud project with BigQuery enabled, a service account with checks permissions and a JSON key file, and that your network allows access from Qarion to Google Cloud.
Creating the Connection
Step 1: Add Source System
Navigate to Source Systems, click Add Source System, and select BigQuery as the platform type.
Step 2: Connection Details
Enter your project details, including the Project ID (e.g., my-company-data) and optionally the default Dataset and Location.
Step 3: Authentication
Upload your service account JSON key. This key, downloaded from the GCP Console, allows Qarion to authenticate and perform operations securely.
Creating a Service Account
Step 1: Create Account
In the GCP Console (IAM → Service Accounts), create a new account with a descriptive name like qarion-integration and a description indicating its purpose for metadata sync.
Step 2: Assign Roles
Grant the necessary roles at the project level. Use BigQuery Data Viewer for reading data and metadata, BigQuery Metadata Viewer for schema access, and BigQuery Job User to run quality check queries. Alternatively, create a custom role with specific permissions like bigquery.datasets.get, bigquery.tables.getData, and bigquery.jobs.create.
title: "Qarion Integration"
description: "Minimum permissions for Qarion"
includedPermissions:
- bigquery.datasets.get
- bigquery.tables.get
- bigquery.tables.getData
- bigquery.tables.list
- bigquery.jobs.create
- bigquery.jobs.get
Step 3: Create Key
Generate a new key for the service account, select JSON format, and download it to a secure location for upload to Qarion.
Syncing Metadata
Initial Sync
After authentication, click Sync Now, select the datasets you want to include, and wait for the process to complete.
What Gets Synced
Qarion synchronizes Datasets, Tables (both native and external), and Views (standard and materialized). It captures Columns with their types, modes, and descriptions, as well as Partitioning and Clustering information.
Dataset Filtering
You can configure filters to include specific datasets (e.g., analytics_*) or exclude others (e.g., *_staging, *_temp).
Include: analytics_*, reporting
Exclude: *_staging, *_temp
Quality Checks
Supported Check Types
Supported checks include Freshness (using MAX(_PARTITIONTIME)), Row Count, Custom SQL, and Schema validation against expected structures.
Cost Optimization
Since BigQuery charges for data scanned, optimize checks by using partitioned tables, selecting only necessary columns, using sampling, and setting maximum bytes billed limits on queries.
-- Example: Efficient freshness check on partitioned table
SELECT MAX(event_date)
FROM `project.dataset.events`
WHERE _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
Lineage Discovery
Automatic Lineage
Lineage is discovered by parsing View definitions, analyzing Scheduled queries, and reviewing INFORMATION_SCHEMA job history.
Enabling Job History Access
To enable full lineage discovery from job history, grant the roles/bigquery.resourceViewer permission on the project to your service account.
-- Grant access to job information
GRANT `roles/bigquery.resourceViewer`
ON PROJECT `my-project`
TO `serviceAccount:qarion@project.iam.gserviceaccount.com`;
Multi-Project Setup
Cross-Project Access
To sync metadata across projects, create a service account in one project and grant it roles in each target project. You can then configure multiple source systems or use a single connection that lists all accessible projects.
Shared Datasets
For datasets shared with your project, simply grant BigQuery Data Viewer on the shared dataset, and Qarion will discover it during sync.
Troubleshooting
Connection Errors
"Permission denied" usually indicates missing roles or incorrect project IDs. "Service account key invalid" suggests a corrupted key file or expiration; try re-downloading the JSON key.
Sync Issues
Missing datasets may stem from lack of bigquery.datasets.get permission or dataset-level restrictions. Slow syncs are often caused by large dataset sizes; try filtering or scheduling during off-peak times.
Query Errors
"Query exceeded resource limits" can be resolved by adding LIMIT clauses, filtering with WHERE to reduce scanning, or using sampling for large tables.
Best Practices
Security
Use a dedicated service account, rotate keys annually, and monitor audit logs regularly.
Cost Control
Leverage partition filters, set byte billing limits, and schedule checks during off-peak hours to manage costs.
Performance
Sync only necessary datasets, use incremental syncs where possible, and filter out large, irrelevant datasets.
Related Documentation
- Source Systems — General management
- Quality Checks — Check configuration
- Data Lineage — Lineage features