Skip to main content

Troubleshooting Guide

Common issues and solutions for Baselinr.

Table of Contents

Installation Issues

"Command not found: baselinr"

The CLI command is not in your PATH.

Solutions:

  1. Reinstall Baselinr:

    pip install --force-reinstall -e .
  2. Use Python module directly:

    python -m baselinr.cli profile --config config.yml
  3. Check Python path:

    which python
    pip show baselinr
  4. Activate virtual environment:

    source venv/bin/activate  # Linux/Mac
    .\venv\Scripts\Activate.ps1 # Windows

"ModuleNotFoundError: No module named 'baselinr'"

The package is not installed or not in your Python path.

Solutions:

  1. Install Baselinr:

    pip install -e .
  2. Verify installation:

    python -c "import baselinr; print(baselinr.__version__)"
  3. Check Python environment:

    python --version
    which python # Linux/Mac
    where python # Windows

Permission Errors (Windows)

If you get permission errors during installation:

Solutions:

  1. Run PowerShell as Administrator

  2. Use user installation:

    pip install --user -e .
  3. Use virtual environment:

    python -m venv venv
    .\venv\Scripts\Activate.ps1
    pip install -e .

Missing Dependencies

If you get import errors for optional dependencies:

Solutions:

  1. Install with optional dependencies:

    pip install -e ".[snowflake]"  # For Snowflake
    pip install -e ".[dagster]" # For Dagster
    pip install -e ".[all]" # For everything
  2. Install specific package:

    pip install snowflake-connector-python  # For Snowflake
    pip install dagster # For Dagster

Configuration Issues

"pydantic.errors.ValidationError"

Your configuration file has errors.

Solutions:

  1. Check YAML syntax (indentation matters):

    # Correct
    source:
    type: postgres
    host: localhost

    # Incorrect (wrong indentation)
    source:
    type: postgres
    host: localhost
  2. Validate configuration:

    python -c "
    import logging
    logging.basicConfig(level=logging.DEBUG)
    from baselinr.config.loader import ConfigLoader
    ConfigLoader.load_from_file('config.yml')
    "
  3. Check required fields:

    • environment: Must be set
    • source: Must include type and database
    • storage: Must include connection
  4. Verify database type is valid:

    • Valid types: postgres, snowflake, sqlite, mysql, bigquery, redshift

Configuration File Not Found

The configuration file path is incorrect.

Solutions:

  1. Use absolute path:

    baselinr profile --config /full/path/to/config.yml
  2. Use relative path correctly:

    baselinr profile --config ./config.yml
    baselinr profile --config examples/config.yml
  3. Check current directory:

    pwd  # Linux/Mac
    cd # Windows

Connection Issues

"Connection refused" or "Connection timeout"

Unable to connect to the database.

Solutions:

  1. Check database is running:

    # PostgreSQL
    psql -h localhost -U user -d database

    # Docker
    docker ps
    docker-compose logs postgres
  2. Verify connection parameters:

    • Host: Check if localhost or IP is correct
    • Port: Verify port number (5432 for PostgreSQL, 5439 for Redshift)
    • Database: Ensure database exists
    • Username/Password: Verify credentials
  3. Check firewall/network:

    # Test connection
    telnet hostname port
    nc -zv hostname port # Linux/Mac
    Test-NetConnection hostname -Port port # Windows
  4. Use connection string directly:

    # Test with psql (PostgreSQL)
    psql "postgresql://user:password@host:port/database"

Snowflake Connection Issues

Specific issues with Snowflake connections.

Solutions:

  1. Install Snowflake connector:

    pip install -e ".[snowflake]"
  2. Verify required fields:

    • account: Snowflake account identifier
    • warehouse: Warehouse name
    • database: Database name
    • username and password: Credentials
  3. Check optional fields:

    • role: Role name (recommended)
    • schema: Schema name
  4. Test connection:

    from baselinr.connectors.snowflake import SnowflakeConnector
    from baselinr.config.schema import ConnectionConfig

    config = ConnectionConfig(
    type="snowflake",
    account="myaccount",
    warehouse="compute_wh",
    database="my_database",
    username="user",
    password="pass"
    )
    connector = SnowflakeConnector(config)
    engine = connector.get_engine()

"SSL connection required"

Database requires SSL connection.

Solutions:

  1. Enable SSL in connection config:

    source:
    type: postgres
    host: hostname
    # Add SSL parameters in extra_params
    extra_params:
    sslmode: require
  2. For Snowflake, SSL is automatic

  3. For Redshift, use SSL port 5439

BigQuery Connection Issues

Issues connecting to BigQuery.

Solutions:

  1. Set up credentials:

    source:
    type: bigquery
    database: project.dataset
    extra_params:
    credentials_path: /path/to/key.json
  2. Set environment variable:

    export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json"
  3. Verify credentials file exists and is valid

Profiling Issues

"Table not found" or "Schema not found"

The specified table or schema doesn't exist.

Solutions:

  1. Verify table exists:

    -- PostgreSQL
    SELECT * FROM information_schema.tables
    WHERE table_schema = 'public' AND table_name = 'customers';
  2. Check schema name:

    profiling:
    tables:
    - table: customers
    schema: public # Make sure schema name is correct
  3. List available tables:

    from baselinr.connectors.factory import create_connector
    from baselinr.config.loader import ConfigLoader

    config = ConfigLoader.load_from_file("config.yml")
    connector = create_connector(config.source)
    engine = connector.get_engine()

    # PostgreSQL
    from sqlalchemy import inspect
    inspector = inspect(engine)
    print(inspector.get_table_names(schema='public'))

Profiling Takes Too Long

Profiling is slow for large tables.

Solutions:

  1. Enable sampling:

    profiling:
    tables:
    - table: large_table
    sampling:
    enabled: true
    method: random
    fraction: 0.01 # Sample 1%
    max_rows: 1000000 # Cap at 1M rows
  2. Use partition-aware profiling:

    profiling:
    tables:
    - table: partitioned_table
    partition:
    strategy: latest # Profile only latest partition
  3. Enable parallelism:

    execution:
    max_workers: 4 # Parallel profiling
  4. Reduce metrics computed:

    profiling:
    metrics:
    - count
    - null_ratio
    # Remove expensive metrics like histograms for large tables

"Out of memory" or Memory Issues

Profiling uses too much memory.

Solutions:

  1. Enable sampling for large tables

  2. Reduce parallelism:

    execution:
    max_workers: 1 # Sequential processing
  3. Increase system memory or use smaller sample sizes

  4. Profile tables individually instead of all at once

No Results Stored

Profiling runs but no results appear in storage.

Solutions:

  1. Check storage connection:

    storage:
    connection:
    type: postgres
    host: localhost
    # ... verify connection works
    create_tables: true
  2. Verify tables were created:

    SELECT * FROM baselinr_runs ORDER BY profiled_at DESC LIMIT 10;
    SELECT * FROM baselinr_results LIMIT 10;
  3. Check for errors in logs:

    baselinr profile --config config.yml --verbose
  4. Ensure dry_run is False (default)

Drift Detection Issues

"No baseline run found"

No baseline run is available for comparison.

Solutions:

  1. Ensure you have at least 2 profiling runs:

    # Run profiling twice
    baselinr profile --config config.yml
    # Wait a bit or make changes to data
    baselinr profile --config config.yml

    # Now detect drift
    baselinr drift --config config.yml --dataset customers
  2. Check runs exist:

    baselinr query runs --config config.yml --table customers
  3. Specify baseline explicitly:

    baselinr drift --config config.yml --dataset customers --baseline-run-id <run-id>

Too Many False Positives

Drift detection triggers too often.

Solutions:

  1. Adjust thresholds:

    drift_detection:
    absolute_threshold:
    low_threshold: 10.0 # Increase from 5.0
    medium_threshold: 20.0 # Increase from 15.0
    high_threshold: 40.0 # Increase from 30.0
  2. Enable type-specific thresholds:

    drift_detection:
    enable_type_specific_thresholds: true
    type_specific_thresholds:
    numeric:
    mean:
    low: 15.0 # More lenient for numeric means
  3. Use statistical strategy instead:

    drift_detection:
    strategy: statistical
    statistical:
    sensitivity: low # Less sensitive
  4. Change baseline strategy:

    drift_detection:
    baselines:
    strategy: moving_average # Use average instead of last run
    windows:
    moving_average: 7 # Average over 7 runs

No Drift Detected When Expected

Drift detection doesn't catch changes.

Solutions:

  1. Lower thresholds:

    drift_detection:
    absolute_threshold:
    low_threshold: 2.0 # Lower from 5.0
    medium_threshold: 5.0 # Lower from 15.0
  2. Verify data actually changed:

    # Query metrics directly
    baselinr query run-details --config config.yml --run-id <run-id>
  3. Check correct baseline is being used:

    baselinr drift --config config.yml --dataset customers --verbose

Performance Issues

Slow Profiling

Profiling is taking longer than expected.

Solutions:

  1. Enable parallelism:

    execution:
    max_workers: 4
  2. Use sampling for large tables

  3. Profile fewer tables per run

  4. Enable incremental profiling:

    incremental:
    enabled: true
    change_detection:
    enabled: true
  5. Check database connection performance and network latency

High Database Load

Profiling causes database performance issues.

Solutions:

  1. Reduce parallelism to limit concurrent queries

  2. Profile during off-peak hours

  3. Use sampling to reduce data scanned

  4. Enable incremental profiling to skip unchanged tables

  5. Use read replicas if available

Storage Issues

Tables Not Created

Storage tables are not automatically created.

Solutions:

  1. Ensure create_tables is enabled:

    storage:
    create_tables: true
  2. Create tables manually:

    baselinr migrate apply --config config.yml
  3. Check database permissions (CREATE TABLE permission required)

Migration Errors

Schema migrations fail.

Solutions:

  1. Check migration status:

    baselinr migrate status --config config.yml
  2. Validate schema:

    baselinr migrate validate --config config.yml
  3. Apply migrations:

    baselinr migrate apply --config config.yml
  4. Check for conflicts with existing schema

CLI Issues

Command Hangs or Freezes

CLI command appears to hang.

Solutions:

  1. Check if profiling is actually running (large tables take time)

  2. Enable verbose output:

    baselinr profile --config config.yml --verbose
  3. Check database connection is active

  4. Kill and restart if necessary

Verbose Output Not Showing

Verbose flag doesn't show expected output.

Solutions:

  1. Check command syntax:

    baselinr plan --config config.yml --verbose
  2. Some commands may not support verbose flag yet

  3. Check logs in database or files if configured

SDK Issues

Client Initialization Fails

BaselinrClient fails to initialize.

Solutions:

  1. Verify config file exists and is valid:

    from baselinr.config.loader import ConfigLoader
    config = ConfigLoader.load_from_file("config.yml")
    print(config.environment)
  2. Check config parameter format:

    # Correct
    client = BaselinrClient(config_path="config.yml")
    client = BaselinrClient(config=config_dict)

    # Incorrect - don't provide both
    client = BaselinrClient(config_path="config.yml", config=config_dict)
  3. Verify configuration is valid BaselinrConfig or dict

Query Methods Return Empty Results

Query methods don't return expected data.

Solutions:

  1. Verify profiling has been run:

    runs = client.query_runs(days=7)
    print(f"Found {len(runs)} runs")
  2. Check filters aren't too restrictive:

    # Too restrictive
    runs = client.query_runs(table="nonexistent", days=1)

    # Better
    runs = client.query_runs(days=30)
  3. Ensure storage connection is correct and tables exist

Getting Help

If you're still experiencing issues:

  1. Check Documentation:

  2. Review Examples:

    • Check examples/ directory for working configurations
    • Review examples/config.yml for reference
  3. Enable Debug Logging:

    import logging
    logging.basicConfig(level=logging.DEBUG)
  4. Open an Issue: