BigQuery Integration

Connect Qarion to Google BigQuery to sync metadata, run quality checks, and discover lineage.

Prerequisites

Before connecting, ensure you have a Google Cloud project with BigQuery enabled, a service account with checks permissions and a JSON key file, and that your network allows access from Qarion to Google Cloud.

Creating the Connection

Step 1: Add Source System

Navigate to Source Systems, click Add Source System, and select BigQuery as the platform type.

Step 2: Connection Details

Enter your project details, including the Project ID (e.g., my-company-data) and optionally the default Dataset and Location.

Step 3: Authentication

Upload your service account JSON key. This key, downloaded from the GCP Console, allows Qarion to authenticate and perform operations securely.

Creating a Service Account

Step 1: Create Account

In the GCP Console (IAM → Service Accounts), create a new account with a descriptive name like qarion-integration and a description indicating its purpose for metadata sync.

Step 2: Assign Roles

Grant the necessary roles at the project level. Use BigQuery Data Viewer for reading data and metadata, BigQuery Metadata Viewer for schema access, and BigQuery Job User to run quality check queries. Alternatively, create a custom role with specific permissions like bigquery.datasets.get, bigquery.tables.getData, and bigquery.jobs.create.

title: "Qarion Integration"
description: "Minimum permissions for Qarion"
includedPermissions:
  - bigquery.datasets.get
  - bigquery.tables.get
  - bigquery.tables.getData
  - bigquery.tables.list
  - bigquery.jobs.create
  - bigquery.jobs.get

Step 3: Create Key

Generate a new key for the service account, select JSON format, and download it to a secure location for upload to Qarion.

Syncing Metadata

Initial Sync

After authentication, click Sync Now, select the datasets you want to include, and wait for the process to complete.

What Gets Synced

Qarion synchronizes Datasets, Tables (both native and external), and Views (standard and materialized). It captures Columns with their types, modes, and descriptions, as well as Partitioning and Clustering information.

Dataset Filtering

You can configure filters to include specific datasets (e.g., analytics_*) or exclude others (e.g., *_staging, *_temp).

Include: analytics_*, reporting
Exclude: *_staging, *_temp

Quality Checks

Supported Check Types

Supported checks include Freshness (using MAX(_PARTITIONTIME)), Row Count, Custom SQL, and Schema validation against expected structures.

Cost Optimization

Since BigQuery charges for data scanned, optimize checks by using partitioned tables, selecting only necessary columns, using sampling, and setting maximum bytes billed limits on queries.

-- Example: Efficient freshness check on partitioned table
SELECT MAX(event_date)
FROM `project.dataset.events`
WHERE _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)

Lineage Discovery

Automatic Lineage

Lineage is discovered by parsing View definitions, analyzing Scheduled queries, and reviewing INFORMATION_SCHEMA job history.

Enabling Job History Access

To enable full lineage discovery from job history, grant the roles/bigquery.resourceViewer permission on the project to your service account.

-- Grant access to job information
GRANT `roles/bigquery.resourceViewer`
ON PROJECT `my-project`
TO `serviceAccount:qarion@project.iam.gserviceaccount.com`;

Multi-Project Setup

Cross-Project Access

To sync metadata across projects, create a service account in one project and grant it roles in each target project. You can then configure multiple source systems or use a single connection that lists all accessible projects.

Shared Datasets

For datasets shared with your project, simply grant BigQuery Data Viewer on the shared dataset, and Qarion will discover it during sync.

Troubleshooting

Connection Errors

"Permission denied" usually indicates missing roles or incorrect project IDs. "Service account key invalid" suggests a corrupted key file or expiration; try re-downloading the JSON key.

Sync Issues

Missing datasets may stem from lack of bigquery.datasets.get permission or dataset-level restrictions. Slow syncs are often caused by large dataset sizes; try filtering or scheduling during off-peak times.

Query Errors

"Query exceeded resource limits" can be resolved by adding LIMIT clauses, filtering with WHERE to reduce scanning, or using sampling for large tables.

Best Practices

Security

Use a dedicated service account, rotate keys annually, and monitor audit logs regularly.

Cost Control

Leverage partition filters, set byte billing limits, and schedule checks during off-peak hours to manage costs.

Performance

Sync only necessary datasets, use incremental syncs where possible, and filter out large, irrelevant datasets.

Source Systems — General management
Quality Checks — Check configuration
Data Lineage — Lineage features

Prerequisites​

Creating the Connection​

Step 1: Add Source System​

Step 2: Connection Details​

Step 3: Authentication​

Creating a Service Account​

Step 1: Create Account​

Step 2: Assign Roles​

Step 3: Create Key​

Syncing Metadata​

Initial Sync​

What Gets Synced​

Dataset Filtering​

Quality Checks​

Supported Check Types​

Cost Optimization​

Lineage Discovery​

Automatic Lineage​

Enabling Job History Access​

Multi-Project Setup​

Cross-Project Access​

Shared Datasets​

Troubleshooting​

Connection Errors​

Sync Issues​

Query Errors​

Best Practices​

Security​

Cost Control​

Performance​

Related Documentation​