dbt Integration Guide

Baselinr provides comprehensive integration with dbt (data build tool) to enable scalable profiling and drift detection within your dbt workflows.

Overview

Baselinr integrates with dbt by allowing you to use dbt model references and selectors directly in your baselinr configuration files. This enables you to profile dbt models without maintaining separate table lists.

Installation

The dbt integration is included in the main baselinr package:

pip install baselinr

No additional dbt package installation is required. The integration works by reading dbt's manifest.json file to resolve model references and selectors.

Using dbt Refs and Selectors in Baselinr Configs

Using dbt Refs

Reference dbt models directly in your baselinr configuration:

profiling:
  tables:
    - dbt_ref: customers
      dbt_project_path: ./dbt_project
    - dbt_ref: orders
      dbt_project_path: ./dbt_project

The dbt_ref field accepts:

Simple model name: "customers"
Package-qualified name: "package.model_name"

Using dbt Selectors

Use dbt selector syntax to select multiple models:

profiling:
  tables:
    - dbt_selector: tag:critical
      dbt_project_path: ./dbt_project
    - dbt_selector: config.materialized:table
      dbt_project_path: ./dbt_project
    - dbt_selector: tag:critical+tag:customer  # Union
    - dbt_selector: tag:critical,config.materialized:table  # Intersection

Supported selector syntax:

tag:tag_name - Models with specific tag
config.materialized:table - Models with specific materialization
name:model_name - Specific model name
path:models/staging - Models in specific path
package:package_name - Models in package
+ - Union (OR logic)
, - Intersection (AND logic)

Configuration Options

profiling:
  tables:
    - dbt_ref: customers
      dbt_project_path: ./dbt_project  # Path to dbt project root
      dbt_manifest_path: ./target/manifest.json  # Optional: explicit manifest path
      schema: analytics  # Optional: override schema
      partition:  # Optional: partition config
        key: date
        strategy: latest
      sampling:  # Optional: sampling config
        enabled: true
        fraction: 0.1

Manifest Path Resolution

Baselinr will automatically detect the manifest.json file:

If dbt_manifest_path is provided, use it
If dbt_project_path is provided, look for target/manifest.json
Otherwise, raise an error

Note: You must run dbt compile or dbt run first to generate the manifest.json file.

Running Profiling After dbt

Since dbt hooks can only execute SQL (not Python scripts), you cannot run baselinr profiling directly from dbt hooks. Instead, use one of these approaches:

Option 1: Orchestrator (Recommended for Production)

Use your orchestrator (Airflow, Dagster, Prefect, etc.) to run profiling after dbt run:

# Example: Airflow DAG
dbt_run = BashOperator(
    task_id='dbt_run',
    bash_command='dbt run'
)

baselinr_profile = BashOperator(
    task_id='baselinr_profile',
    bash_command='baselinr profile --config baselinr_config.yml'
)

dbt_run >> baselinr_profile

Option 2: Script After dbt Run

Run a script after dbt run that reads run_results.json to find materialized models:

dbt run
python scripts/baselinr_run_profiling_for_models.py

This script automatically:

Reads target/run_results.json to find successfully materialized models
Reads target/manifest.json to get per-model configurations
Runs profiling for each model

Option 3: CI/CD Pipeline

Add profiling as a step in your CI/CD pipeline:

# Example: GitHub Actions
- name: Run dbt models
  run: dbt run

- name: Profile models
  run: baselinr profile --config baselinr_config.yml

Examples

Example 1: Profile All Critical Models

# baselinr_config.yml
profiling:
  tables:
    - dbt_selector: tag:critical
      dbt_project_path: ./dbt_project

Example 2: Profile Specific Models with Partitioning

profiling:
  tables:
    - dbt_ref: daily_events
      dbt_project_path: ./dbt_project
      partition:
        key: event_date
        strategy: latest
    - dbt_ref: hourly_metrics
      dbt_project_path: ./dbt_project
      partition:
        key: metric_hour
        strategy: recent_n
        recent_n: 24

Example 3: Profile dbt Models with Partitioning

# baselinr_config.yml
profiling:
  tables:
    - dbt_ref: customers
      dbt_project_path: ./dbt_project
      partition:
        key: created_date
        strategy: latest
      metrics:
        - count
        - mean
        - stddev

Example 4: Using dbt Selectors for Scalable Profiling

profiling:
  tables:
    # Profile all models tagged as critical
    - dbt_selector: tag:critical
      dbt_project_path: ./dbt_project
    
    # Profile all table-materialized models in staging
    - dbt_selector: config.materialized:table,path:models/staging
      dbt_project_path: ./dbt_project
    
    # Profile models with either tag
    - dbt_selector: tag:customer+tag:order
      dbt_project_path: ./dbt_project

Best Practices

Generate Manifest First: Always run dbt compile or dbt run before using dbt patterns in baselinr configs
Use Tags Strategically: Tag your dbt models to enable scalable profiling (e.g., tag:critical, tag:profile)
Combine with Pattern Matching: Use dbt selectors for model selection, then apply baselinr filters (partitioning, sampling)
Use Orchestrators: Run profiling after dbt run using your orchestrator (Airflow, Dagster, etc.)
Profile Critical Models: Focus profiling on models tagged as critical or high-value

Troubleshooting

Manifest Not Found

Error: dbt manifest not found: ...

Solution: Run dbt compile or dbt run to generate the manifest.json file in the target/ directory.

dbt Ref Not Resolved

Error: Could not resolve dbt ref: model_name

Solution:

Ensure the model exists in your dbt project
Check that the manifest.json is up to date
Verify the model name matches exactly (case-sensitive)

Selector Matches No Models

Warning: dbt selector '...' matched no models

Solution:

Verify the selector syntax is correct
Check that models have the specified tags/configs
Use dbt list --select <selector> to test your selector

Profiling Not Running After dbt

Issue: Profiling doesn't run after dbt run

Solution:

Ensure baselinr Python package is installed
Run profiling manually after dbt run: baselinr profile --config baselinr_config.yml
Use an orchestrator to automate the workflow
Check that target/run_results.json exists (generated by dbt run)

Advanced Usage

Per-Model Configuration

You can still configure profiling settings per model by using dbt selectors with different baselinr configs:

# baselinr_config_critical.yml
profiling:
  tables:
    - dbt_selector: tag:critical
      dbt_project_path: ./dbt_project
      metrics:
        - count
        - mean
        - stddev
        - histogram

# baselinr_config_staging.yml
profiling:
  tables:
    - dbt_selector: path:models/staging
      dbt_project_path: ./dbt_project
      sampling:
        enabled: true
        fraction: 0.1

Then run with different configs:

baselinr profile --config baselinr_config_critical.yml
baselinr profile --config baselinr_config_staging.yml

Overview​

Installation​

Using dbt Refs and Selectors in Baselinr Configs​

Using dbt Refs​

Using dbt Selectors​

Configuration Options​

Manifest Path Resolution​

Running Profiling After dbt​

Option 1: Orchestrator (Recommended for Production)​

Option 2: Script After dbt Run​

Option 3: CI/CD Pipeline​

Examples​

Example 1: Profile All Critical Models​

Example 2: Profile Specific Models with Partitioning​

Example 3: Profile dbt Models with Partitioning​

Example 4: Using dbt Selectors for Scalable Profiling​

Best Practices​

Troubleshooting​

Manifest Not Found​

dbt Ref Not Resolved​

Selector Matches No Models​

Profiling Not Running After dbt​

Advanced Usage​

Per-Model Configuration​

See Also​

Overview

Installation

Using dbt Refs and Selectors in Baselinr Configs

Using dbt Refs

Using dbt Selectors

Configuration Options

Manifest Path Resolution

Running Profiling After dbt

Option 1: Orchestrator (Recommended for Production)

Option 2: Script After dbt Run

Option 3: CI/CD Pipeline

Examples

Example 1: Profile All Critical Models

Example 2: Profile Specific Models with Partitioning

Example 3: Profile dbt Models with Partitioning

Example 4: Using dbt Selectors for Scalable Profiling

Best Practices

Troubleshooting

Manifest Not Found

dbt Ref Not Resolved

Selector Matches No Models

Profiling Not Running After dbt

Advanced Usage

Per-Model Configuration

See Also