Dataset Configuration Guide
This guide explains how to organize dataset configuration in Baselinr using the file-based datasets/ directory and how those configs interact with global settings.
Directory Structure
config.yml— global defaults only (profiling, drift, validation, anomaly)datasets/— dataset-specific files (per table, schema, or database)
# config.yml
datasets:
datasets_dir: ./datasets
auto_discover: true
recursive: true
File naming conventions
{table}.yml— table-level config{schema}_schema.yml— schema-level config{database}_database.yml— database-level config
Configuration Precedence
Highest wins:
- Table file
- Schema file
- Database file
- Global defaults in
config.yml
Column configs inside a table file override column defaults in broader scopes.
Dataset File Examples
# datasets/customers.yml (table-level)
database: warehouse
schema: public
table: customers
profiling:
sampling:
enabled: true
fraction: 0.1
columns:
- name: email
drift:
enabled: true
validation:
rules:
- type: format
column: email
pattern: '^[^@]+@[^@]+\.[^@]+$'
# datasets/analytics_schema.yml (schema-level)
database: warehouse
schema: analytics
profiling:
default_sample_ratio: 0.05
# datasets/warehouse_database.yml (database-level)
database: warehouse
drift:
strategy: absolute_threshold
absolute_threshold:
low_threshold: 5.0
Inline vs File-Based
- Recommended: File-based (under
datasets/) for clarity, version control, and scaling to many datasets. - Inline support: Inline datasets remain supported, but new work should prefer files. Inline configs do not get precedence over table files; files still win.
Common Tasks
- Add a new dataset: create
datasets/{table}.ymlwithdatabase,schema, andtableplus feature blocks (profiling,drift,validation,anomaly,columns). - Schema-wide defaults: use
{schema}_schema.ymlto apply to all tables in that schema. - Database-wide defaults: use
{database}_database.ymlfor coarse defaults. - Preview & validation: use the dashboard Datasets page to preview merged config, view precedence, and validate before saving.
Testing Your Configs
- Run
baselinr validate-config --config config.ymlto validate schema and YAML. - Use the dashboard Datasets page to preview merged configs and see source precedence.
Migration Notes
- The CLI
migrate-configcommand can convert inline datasets to files and create backups. - See
docs/guides/DATASET_MIGRATION.mdordocs/migration/DATASET_CONFIG_MIGRATION.mdfor step-by-step instructions and edge cases.