Skip to main content

Baselinr Quick Start Guide

Get Baselinr up and running in 5 minutes!

Prerequisites

  • Python 3.10 or higher
  • Docker and Docker Compose (for the full example)
  • OR a PostgreSQL database

This is the easiest way to get started with Baselinr.

Step 1: Start the Docker Environment

cd docker
docker-compose up -d

This will start:

  • PostgreSQL with sample data (customers, products, orders tables)
  • Dagster daemon
  • Dagster web UI

Wait about 30 seconds for everything to initialize.

Step 2: Install Baselinr

# From the profile_mesh directory
pip install -e ".[dagster]"

Step 3: Run Your First Profile

baselinr profile --config examples/config.yml

You should see output like:

[1/4] Loading configuration...
✓ Configuration loaded (environment: development)

[2/4] Profiling tables...
✓ Profiled 3 tables:
- customers: 10 columns, 10 rows
- products: 8 columns, 10 rows
- orders: 7 columns, 10 rows

[3/4] Writing results to storage...
✓ Results written to storage

[4/4] Checking for drift...
ℹ Drift detection skipped: Need at least 2 runs for drift detection

Step 4: Run a Second Profile to See Drift Detection

# Run profiling again
baselinr profile --config examples/config.yml

# Now detect drift
baselinr drift --config examples/config.yml --dataset customers

Step 5: Explore Dagster UI

Open http://localhost:3000 in your browser to see:

  • Profiling assets for each table
  • Job runs and schedules
  • Asset lineage graph

Option 2: Your Own PostgreSQL Database

Step 1: Install Baselinr

pip install -e .

Step 2: Create Configuration

Create a my_config.yml file:

environment: development

source:
type: postgres
host: your-db-host
port: 5432
database: your-database
username: your-username
password: your-password
schema: public

storage:
connection:
type: postgres
host: your-db-host
port: 5432
database: your-database
username: your-username
password: your-password
results_table: baselinr_results
runs_table: baselinr_runs
create_tables: true

profiling:
tables:
- table: your_table_name
sample_ratio: 1.0

Step 3: Run Profiling

baselinr profile --config my_config.yml

Option 3: SQLite (Minimal Setup)

For the absolute quickest test without any external dependencies:

Step 1: Create a SQLite Database

# create_sample.py
import sqlite3

conn = sqlite3.connect('sample.db')
cursor = conn.cursor()

cursor.execute('''
CREATE TABLE users (
id INTEGER PRIMARY KEY,
name TEXT,
age INTEGER,
email TEXT
)
''')

cursor.executemany('INSERT INTO users VALUES (?,?,?,?)', [
(1, 'Alice', 30, '[email protected]'),
(2, 'Bob', 25, '[email protected]'),
(3, 'Charlie', 35, '[email protected]'),
])

conn.commit()
conn.close()

Run it:

python create_sample.py

Step 2: Create Configuration

# config_sqlite.yml
environment: development

source:
type: sqlite
filepath: ./sample.db

storage:
connection:
type: sqlite
filepath: ./sample.db
results_table: baselinr_results
runs_table: baselinr_runs
create_tables: true

profiling:
tables:
- table: users
sample_ratio: 1.0

Step 3: Run Profiling

pip install -e .
baselinr profile --config config_sqlite.yml

Next Steps

1. Explore the Results

Query the results in your database:

-- View all profiling runs
SELECT * FROM baselinr_runs ORDER BY profiled_at DESC;

-- View metrics for a specific table
SELECT column_name, metric_name, metric_value
FROM baselinr_results
WHERE dataset_name = 'customers'
AND run_id = '<latest-run-id>'
ORDER BY column_name, metric_name;

2. Set Up Drift Monitoring

Run profiling multiple times and compare:

# Profile now
baselinr profile --config examples/config.yml

# Make some changes to your data...

# Profile again
baselinr profile --config examples/config.yml

# Detect drift
baselinr drift --config examples/config.yml --dataset customers

3. Integrate with Dagster

See the Dagster UI at http://localhost:3000 to:

  • Schedule regular profiling jobs
  • View profiling history
  • Set up alerts on drift detection

4. Customize Your Configuration

Edit examples/config.yml to:

  • Add more tables to profile
  • Adjust sampling ratios for large tables
  • Configure which metrics to compute
  • Change drift detection thresholds

Common Issues

"Connection refused" Error

If you get a connection error with Docker:

  • Make sure Docker is running
  • Wait 30 seconds after docker-compose up -d
  • Check logs: docker-compose logs postgres

"Module not found" Error

Make sure you've installed Baselinr:

pip install -e .

Docker Not Available

Use Option 2 (your own PostgreSQL) or Option 3 (SQLite).

Get Help

Happy profiling! 🧩