Schema Change Detection
Baselinr includes comprehensive schema change detection that automatically tracks and alerts on schema modifications to your data tables.
Overview
Schema change detection maintains an internal registry of table schemas and compares them across profiling runs to detect:
- New columns: Columns that have been added to a table
- Dropped columns: Columns that have been removed from a table
- Renamed columns: Columns that have been renamed (detected heuristically)
- Type changes: Changes to column data types
- Partition changes: Changes to table partitioning (Snowflake-specific)
How It Works
Schema Registry
Baselinr maintains a baselinr_schema_registry table that stores schema snapshots for each profiling run. Each column is tracked with:
- Column name and type
- Column hash (based on name, type, and nullable status)
- First seen and last seen timestamps
- Run ID association
Change Detection Process
- Schema Registration: After profiling completes, the current schema is registered in the schema registry
- Comparison: The current schema is compared to the previous schema snapshot
- Change Detection: Various algorithms detect different types of changes:
- Direct comparison for added/removed columns
- Heuristic matching for renamed columns (name similarity + type matching)
- Type comparison for type changes
- Warehouse-specific queries for partition changes
- Event Emission: Detected changes are emitted as
SchemaChangeDetectedevents
Configuration
Basic Configuration
Enable schema change detection in your config.yml:
schema_change:
enabled: true
similarity_threshold: 0.7 # For rename detection (0.0-1.0)
Suppression Rules
Suppress schema change events for specific tables or change types:
schema_change:
enabled: true
similarity_threshold: 0.7
suppression:
# Suppress all changes for a specific table
- table: "staging_table"
# Suppress only column additions globally
- change_type: "column_added"
# Suppress column additions for a specific table
- table: "orders"
change_type: "column_added"
# Suppress all changes for a table in a specific schema
- table: "customers"
schema: "public"
Suppression Rule Matching
Suppression rules match when:
- Table name matches (if specified)
- Schema name matches (if specified)
- Change type matches (if specified)
All specified conditions must match for suppression to apply.
Change Types
Column Added
Detected when a new column appears in the current schema that wasn't in the previous schema.
Event Fields:
change_type:"column_added"column: New column namenew_type: Column data typechange_severity:"low"
Column Removed
Detected when a column from the previous schema is missing in the current schema.
Event Fields:
change_type:"column_removed"column: Removed column nameold_type: Previous column data typechange_severity:"high"
Column Renamed
Detected heuristically when:
- A column is removed and a new column is added
- The names are similar (similarity > threshold, default 0.7)
- The types are compatible
Event Fields:
change_type:"column_renamed"column: New column nameold_column_name: Previous column nameold_type: Previous column typenew_type: New column typechange_severity:"medium"
Similarity Calculation:
- Uses Levenshtein distance (edit distance)
- Includes prefix/suffix matching bonuses
- Default threshold: 0.7 (configurable)
Type Changed
Detected when a column exists in both schemas but has a different type.
Event Fields:
change_type:"type_changed"column: Column nameold_type: Previous typenew_type: New typechange_severity: Determined automatically:"low": Compatible changes (int→bigint, varchar→text)"medium": Other type changes"breaking": Incompatible changes (numeric→string, string→numeric)
Partition Changed
Detected when table partitioning changes (Snowflake-specific).
Event Fields:
change_type:"partition_changed"partition_info: Dict with partition metadatachange_severity:"high"
Note: Partition change detection requires appropriate Snowflake permissions to query INFORMATION_SCHEMA.TABLE_STORAGE_METRICS.
Event Handling
Schema change events are emitted via the event bus and can be handled by any registered hooks:
from baselinr.events import SchemaChangeDetected
# Events are automatically emitted during profiling
# Handle them via event hooks (see EVENTS_AND_HOOKS.md)
Event Structure
SchemaChangeDetected(
event_type="SchemaChangeDetected",
timestamp=datetime.utcnow(),
table="customers",
change_type="column_added",
column="new_field",
new_type="VARCHAR(255)",
change_severity="low",
metadata={...}
)
Examples
Example 1: Basic Detection
# config.yml
schema_change:
enabled: true
Run profiling:
baselinr profile --config config.yml
If a column is added, you'll see a SchemaChangeDetected event with change_type="column_added".
Example 2: Suppress Column Additions
# config.yml
schema_change:
enabled: true
suppression:
# Suppress all column additions
- change_type: "column_added"
Example 3: Table-Specific Suppression
# config.yml
schema_change:
enabled: true
suppression:
# Suppress all changes for staging tables
- table: "staging_orders"
- table: "staging_customers"
# But still alert on type changes for production tables
- table: "production_orders"
change_type: "column_added" # Only suppress additions
Example 4: Custom Similarity Threshold
# config.yml
schema_change:
enabled: true
similarity_threshold: 0.8 # Stricter rename detection
Higher thresholds require more similarity to detect renames (fewer false positives, more false negatives).
Database Migration
The schema registry table is created automatically via migration v2. To manually apply:
baselinr migrate apply --config config.yml --target 2
Best Practices
- Enable for Production: Always enable schema change detection in production environments
- Configure Suppression: Suppress noisy tables (e.g., staging tables that change frequently)
- Monitor High Severity: Pay special attention to
"high"and"breaking"severity changes - Review Renames: Heuristic rename detection may have false positives/negatives - review carefully
- Partition Monitoring: For Snowflake, ensure appropriate permissions for partition change detection
Troubleshooting
No Events Emitted
- Check that
schema_change.enabled: truein config - Verify
profiling.enable_schema_tracking: true(default: true) - Ensure event hooks are configured and enabled
False Positive Renames
- Increase
similarity_thresholdto require more similarity - Review the similarity calculation logic if needed
Missing Partition Changes
- For Snowflake, verify permissions to query
INFORMATION_SCHEMA.TABLE_STORAGE_METRICS - Check logs for permission errors
Migration Issues
- Ensure storage database is accessible
- Check migration status:
baselinr migrate status --config config.yml - Apply migrations if needed:
baselinr migrate apply --config config.yml
Related Documentation
- Events and Hooks - Event system overview
- Profiling Enrichment - Related enrichment features
- Schema Reference - Database schema details