Troubleshooting Guide
Common issues and solutions for Baselinr.
Table of Contents
- Installation Issues
- Configuration Issues
- Connection Issues
- Profiling Issues
- Drift Detection Issues
- Performance Issues
- Storage Issues
- CLI Issues
- SDK Issues
- Getting Help
Installation Issues
"Command not found: baselinr"
The CLI command is not in your PATH.
Solutions:
-
Reinstall Baselinr:
pip install --force-reinstall -e . -
Use Python module directly:
python -m baselinr.cli profile --config config.yml -
Check Python path:
which python
pip show baselinr -
Activate virtual environment:
source venv/bin/activate # Linux/Mac
.\venv\Scripts\Activate.ps1 # Windows
"ModuleNotFoundError: No module named 'baselinr'"
The package is not installed or not in your Python path.
Solutions:
-
Install Baselinr:
pip install -e . -
Verify installation:
python -c "import baselinr; print(baselinr.__version__)" -
Check Python environment:
python --version
which python # Linux/Mac
where python # Windows
Permission Errors (Windows)
If you get permission errors during installation:
Solutions:
-
Run PowerShell as Administrator
-
Use user installation:
pip install --user -e . -
Use virtual environment:
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -e .
Missing Dependencies
If you get import errors for optional dependencies:
Solutions:
-
Install with optional dependencies:
pip install -e ".[snowflake]" # For Snowflake
pip install -e ".[dagster]" # For Dagster
pip install -e ".[all]" # For everything -
Install specific package:
pip install snowflake-connector-python # For Snowflake
pip install dagster # For Dagster
Configuration Issues
"pydantic.errors.ValidationError"
Your configuration file has errors.
Solutions:
-
Check YAML syntax (indentation matters):
# Correct
source:
type: postgres
host: localhost
# Incorrect (wrong indentation)
source:
type: postgres
host: localhost -
Validate configuration:
python -c "
import logging
logging.basicConfig(level=logging.DEBUG)
from baselinr.config.loader import ConfigLoader
ConfigLoader.load_from_file('config.yml')
" -
Check required fields:
environment: Must be setsource: Must includetypeanddatabasestorage: Must includeconnection
-
Verify database type is valid:
- Valid types:
postgres,snowflake,sqlite,mysql,bigquery,redshift
- Valid types:
Configuration File Not Found
The configuration file path is incorrect.
Solutions:
-
Use absolute path:
baselinr profile --config /full/path/to/config.yml -
Use relative path correctly:
baselinr profile --config ./config.yml
baselinr profile --config examples/config.yml -
Check current directory:
pwd # Linux/Mac
cd # Windows
Connection Issues
"Connection refused" or "Connection timeout"
Unable to connect to the database.
Solutions:
-
Check database is running:
# PostgreSQL
psql -h localhost -U user -d database
# Docker
docker ps
docker-compose logs postgres -
Verify connection parameters:
- Host: Check if
localhostor IP is correct - Port: Verify port number (5432 for PostgreSQL, 5439 for Redshift)
- Database: Ensure database exists
- Username/Password: Verify credentials
- Host: Check if
-
Check firewall/network:
# Test connection
telnet hostname port
nc -zv hostname port # Linux/Mac
Test-NetConnection hostname -Port port # Windows -
Use connection string directly:
# Test with psql (PostgreSQL)
psql "postgresql://user:password@host:port/database"
Snowflake Connection Issues
Specific issues with Snowflake connections.
Solutions:
-
Install Snowflake connector:
pip install -e ".[snowflake]" -
Verify required fields:
account: Snowflake account identifierwarehouse: Warehouse namedatabase: Database nameusernameandpassword: Credentials
-
Check optional fields:
role: Role name (recommended)schema: Schema name
-
Test connection:
from baselinr.connectors.snowflake import SnowflakeConnector
from baselinr.config.schema import ConnectionConfig
config = ConnectionConfig(
type="snowflake",
account="myaccount",
warehouse="compute_wh",
database="my_database",
username="user",
password="pass"
)
connector = SnowflakeConnector(config)
engine = connector.get_engine()
"SSL connection required"
Database requires SSL connection.
Solutions:
-
Enable SSL in connection config:
source:
type: postgres
host: hostname
# Add SSL parameters in extra_params
extra_params:
sslmode: require -
For Snowflake, SSL is automatic
-
For Redshift, use SSL port 5439
BigQuery Connection Issues
Issues connecting to BigQuery.
Solutions:
-
Set up credentials:
source:
type: bigquery
database: project.dataset
extra_params:
credentials_path: /path/to/key.json -
Set environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" -
Verify credentials file exists and is valid
Profiling Issues
"Table not found" or "Schema not found"
The specified table or schema doesn't exist.
Solutions:
-
Verify table exists:
-- PostgreSQL
SELECT * FROM information_schema.tables
WHERE table_schema = 'public' AND table_name = 'customers'; -
Check schema name:
profiling:
tables:
- table: customers
schema: public # Make sure schema name is correct -
List available tables:
from baselinr.connectors.factory import create_connector
from baselinr.config.loader import ConfigLoader
config = ConfigLoader.load_from_file("config.yml")
connector = create_connector(config.source)
engine = connector.get_engine()
# PostgreSQL
from sqlalchemy import inspect
inspector = inspect(engine)
print(inspector.get_table_names(schema='public'))
Profiling Takes Too Long
Profiling is slow for large tables.
Solutions:
-
Enable sampling:
profiling:
tables:
- table: large_table
sampling:
enabled: true
method: random
fraction: 0.01 # Sample 1%
max_rows: 1000000 # Cap at 1M rows -
Use partition-aware profiling:
profiling:
tables:
- table: partitioned_table
partition:
strategy: latest # Profile only latest partition -
Enable parallelism:
execution:
max_workers: 4 # Parallel profiling -
Reduce metrics computed:
profiling:
metrics:
- count
- null_ratio
# Remove expensive metrics like histograms for large tables
"Out of memory" or Memory Issues
Profiling uses too much memory.
Solutions:
-
Enable sampling for large tables
-
Reduce parallelism:
execution:
max_workers: 1 # Sequential processing -
Increase system memory or use smaller sample sizes
-
Profile tables individually instead of all at once
No Results Stored
Profiling runs but no results appear in storage.
Solutions:
-
Check storage connection:
storage:
connection:
type: postgres
host: localhost
# ... verify connection works
create_tables: true -
Verify tables were created:
SELECT * FROM baselinr_runs ORDER BY profiled_at DESC LIMIT 10;
SELECT * FROM baselinr_results LIMIT 10; -
Check for errors in logs:
baselinr profile --config config.yml --verbose -
Ensure
dry_runis False (default)
Drift Detection Issues
"No baseline run found"
No baseline run is available for comparison.
Solutions:
-
Ensure you have at least 2 profiling runs:
# Run profiling twice
baselinr profile --config config.yml
# Wait a bit or make changes to data
baselinr profile --config config.yml
# Now detect drift
baselinr drift --config config.yml --dataset customers -
Check runs exist:
baselinr query runs --config config.yml --table customers -
Specify baseline explicitly:
baselinr drift --config config.yml --dataset customers --baseline-run-id <run-id>
Too Many False Positives
Drift detection triggers too often.
Solutions:
-
Adjust thresholds:
drift_detection:
absolute_threshold:
low_threshold: 10.0 # Increase from 5.0
medium_threshold: 20.0 # Increase from 15.0
high_threshold: 40.0 # Increase from 30.0 -
Enable type-specific thresholds:
drift_detection:
enable_type_specific_thresholds: true
type_specific_thresholds:
numeric:
mean:
low: 15.0 # More lenient for numeric means -
Use statistical strategy instead:
drift_detection:
strategy: statistical
statistical:
sensitivity: low # Less sensitive -
Change baseline strategy:
drift_detection:
baselines:
strategy: moving_average # Use average instead of last run
windows:
moving_average: 7 # Average over 7 runs
No Drift Detected When Expected
Drift detection doesn't catch changes.
Solutions:
-
Lower thresholds:
drift_detection:
absolute_threshold:
low_threshold: 2.0 # Lower from 5.0
medium_threshold: 5.0 # Lower from 15.0 -
Verify data actually changed:
# Query metrics directly
baselinr query run-details --config config.yml --run-id <run-id> -
Check correct baseline is being used:
baselinr drift --config config.yml --dataset customers --verbose
Performance Issues
Slow Profiling
Profiling is taking longer than expected.
Solutions:
-
Enable parallelism:
execution:
max_workers: 4 -
Use sampling for large tables
-
Profile fewer tables per run
-
Enable incremental profiling:
incremental:
enabled: true
change_detection:
enabled: true -
Check database connection performance and network latency
High Database Load
Profiling causes database performance issues.
Solutions:
-
Reduce parallelism to limit concurrent queries
-
Profile during off-peak hours
-
Use sampling to reduce data scanned
-
Enable incremental profiling to skip unchanged tables
-
Use read replicas if available
Storage Issues
Tables Not Created
Storage tables are not automatically created.
Solutions:
-
Ensure
create_tablesis enabled:storage:
create_tables: true -
Create tables manually:
baselinr migrate apply --config config.yml -
Check database permissions (CREATE TABLE permission required)
Migration Errors
Schema migrations fail.
Solutions:
-
Check migration status:
baselinr migrate status --config config.yml -
Validate schema:
baselinr migrate validate --config config.yml -
Apply migrations:
baselinr migrate apply --config config.yml -
Check for conflicts with existing schema
CLI Issues
Command Hangs or Freezes
CLI command appears to hang.
Solutions:
-
Check if profiling is actually running (large tables take time)
-
Enable verbose output:
baselinr profile --config config.yml --verbose -
Check database connection is active
-
Kill and restart if necessary
Verbose Output Not Showing
Verbose flag doesn't show expected output.
Solutions:
-
Check command syntax:
baselinr plan --config config.yml --verbose -
Some commands may not support verbose flag yet
-
Check logs in database or files if configured
SDK Issues
Client Initialization Fails
BaselinrClient fails to initialize.
Solutions:
-
Verify config file exists and is valid:
from baselinr.config.loader import ConfigLoader
config = ConfigLoader.load_from_file("config.yml")
print(config.environment) -
Check config parameter format:
# Correct
client = BaselinrClient(config_path="config.yml")
client = BaselinrClient(config=config_dict)
# Incorrect - don't provide both
client = BaselinrClient(config_path="config.yml", config=config_dict) -
Verify configuration is valid BaselinrConfig or dict
Query Methods Return Empty Results
Query methods don't return expected data.
Solutions:
-
Verify profiling has been run:
runs = client.query_runs(days=7)
print(f"Found {len(runs)} runs") -
Check filters aren't too restrictive:
# Too restrictive
runs = client.query_runs(table="nonexistent", days=1)
# Better
runs = client.query_runs(days=30) -
Ensure storage connection is correct and tables exist
Getting Help
If you're still experiencing issues:
-
Check Documentation:
-
Review Examples:
- Check
examples/directory for working configurations - Review
examples/config.ymlfor reference
- Check
-
Enable Debug Logging:
import logging
logging.basicConfig(level=logging.DEBUG) -
Open an Issue:
- GitHub: https://github.com/baselinrhq/baselinr/issues
- Include:
- Error message and traceback
- Configuration file (redact sensitive info)
- Python version
- Database type and version
- Steps to reproduce
Related Documentation
- Installation Guide - Installation troubleshooting
- Configuration Reference - Complete configuration reference
- Best Practices Guide - Recommended patterns
- Performance Tuning Guide - Performance optimization