Statistical Drift Detection
Baselinr provides advanced statistical drift detection using multiple statistical tests that can be selected and combined based on column type and metric characteristics.
Overview
The statistical drift detection strategy uses rigorous statistical methods to detect changes in data distributions, not just simple threshold-based comparisons. It automatically selects appropriate tests based on whether your data is numeric or categorical.
When to Use Statistical Tests
Statistical tests are ideal when you need:
- Distribution-aware detection: Detect changes in data shape, not just mean shifts
- Categorical data analysis: Track changes in category distributions
- Reduced false positives: Statistical significance testing reduces noise
- Multiple perspectives: Combine multiple tests for comprehensive coverage
- Histogram data: Leverage histogram information when available
Configuration
Basic Configuration
drift_detection:
strategy: statistical
statistical:
tests:
- ks_test
- psi
- chi_square
sensitivity: medium
Full Configuration with Test Parameters
drift_detection:
strategy: statistical
statistical:
tests:
- ks_test # Kolmogorov-Smirnov test
- psi # Population Stability Index
- z_score # Z-score test
- chi_square # Chi-square test
- entropy # Entropy change
- top_k # Top-K stability
sensitivity: medium # low, medium, or high
test_params:
ks_test:
alpha: 0.05 # Significance level
psi:
buckets: 10 # Number of distribution buckets
threshold: 0.2 # PSI threshold for drift
z_score:
z_threshold: 2.0 # Z-score threshold (std devs)
chi_square:
alpha: 0.05 # Significance level
entropy:
entropy_threshold: 0.1 # Entropy change threshold
top_k:
k: 10 # Number of top categories
similarity_threshold: 0.7 # Similarity threshold
Available Statistical Tests
Numeric Column Tests
1. Kolmogorov-Smirnov (KS) Test
Test Name: ks_test
Description: Compares the distribution of baseline vs current data. Good for detecting shape changes (skew, multimodality, heavy tails).
Parameters:
alpha: Significance level (default: 0.05)
How it works:
- Compares empirical cumulative distribution functions (CDFs)
- Returns KS statistic (maximum difference between CDFs)
- Calculates p-value for statistical significance
- Works best with histogram data, but can approximate from summary statistics
Example:
test_params:
ks_test:
alpha: 0.05
Best for: Detecting distribution shape changes in numeric columns
2. Population Stability Index (PSI)
Test Name: psi
Description: Bucket-based drift detection. Good for monitoring slow drifts over long periods.
Parameters:
buckets: Number of buckets for distribution (default: 10)threshold: PSI threshold for drift detection (default: 0.2)
PSI Score Interpretation:
< 0.1: No significant drift0.1-0.2: Minor drift0.2-0.5: Moderate drift> 0.5: Significant drift
Example:
test_params:
psi:
buckets: 20
threshold: 0.15
Best for: Long-term drift monitoring, especially with histogram data
3. Z-Score / Variance Test
Test Name: z_score
Description: Detects shifts in mean/variance using standard deviation.
Parameters:
z_threshold: Z-score threshold in standard deviations (default: 2.0)
How it works:
- Calculates:
z = |(current_mean - baseline_mean) / baseline_stddev| - Flags drift if z-score exceeds threshold
- Severity based on z-score magnitude
Example:
test_params:
z_score:
z_threshold: 2.5 # More sensitive (2.5 std devs)
Best for: Detecting mean shifts when you have stddev information
Categorical Column Tests
4. Chi-Square Test
Test Name: chi_square
Description: Tests whether the distribution of categories has changed significantly.
Parameters:
alpha: Significance level (default: 0.05)
How it works:
- Compares observed vs expected category frequencies
- Calculates chi-square statistic
- Uses p-value for statistical significance
Example:
test_params:
chi_square:
alpha: 0.01 # More strict (1% significance)
Best for: Detecting changes in category distributions
5. Entropy Change Test
Test Name: entropy
Description: Detects changes in Shannon entropy (randomness/uniformity) of category distributions.
Parameters:
entropy_threshold: Threshold for entropy change (default: 0.1)
How it works:
- Calculates Shannon entropy:
H = -Σ(p * log2(p)) - Compares baseline vs current entropy
- Flags drift if entropy change exceeds threshold
Example:
test_params:
entropy:
entropy_threshold: 0.15
Best for: Detecting changes in data uniformity/randomness
6. Top-K Stability Test
Test Name: top_k
Description: Tracks the top-K most frequent categories and detects changes.
Parameters:
k: Number of top categories to track (default: 10)similarity_threshold: Similarity threshold for stability (default: 0.7)
How it works:
- Extracts top-K categories from baseline and current
- Calculates Jaccard similarity (intersection / union)
- Flags drift if similarity drops below threshold
Example:
test_params:
top_k:
k: 20
similarity_threshold: 0.8 # More strict
Best for: Monitoring stability of most common categories
Sensitivity Levels
The sensitivity parameter adjusts thresholds across all tests:
low: Less sensitive (higher thresholds) - reduces false positivesmedium: Balanced (default thresholds) - recommended starting pointhigh: More sensitive (lower thresholds) - catches more drift, may have more false positives
How it works:
- Low sensitivity: thresholds × 1.5
- Medium sensitivity: thresholds × 1.0 (default)
- High sensitivity: thresholds × 0.5
Test Selection
The statistical strategy automatically selects applicable tests based on:
- Column Type: Numeric tests for numeric columns, categorical tests for categorical columns
- Metric Type: Tests check if they support the specific metric being compared
- Data Availability: Tests that can't run (insufficient data) are skipped gracefully
Automatic Test Selection
# Numeric column with mean metric
# → Runs: ks_test, psi, z_score (if data available)
# Categorical column with distinct_count metric
# → Runs: chi_square, entropy, top_k (if data available)
Data Requirements
Optimal Data
Statistical tests work best with:
- Histogram data: For KS test and PSI (enables distribution comparison)
- Category distributions: For categorical tests (top values, frequencies)
- Summary statistics: Mean, stddev, min, max (for approximations)
Fallback Behavior
If optimal data isn't available:
- Tests use approximations from summary statistics
- Some tests may skip with a warning
- System falls back to threshold-based detection if no tests can run
Enabling Histogram Data
To get the best results from statistical tests, enable histograms in your profiling config:
profiling:
compute_histograms: true
histogram_bins: 10 # More bins = more granular distribution
Usage Examples
Example 1: Numeric Columns with Histograms
drift_detection:
strategy: statistical
statistical:
tests:
- ks_test
- psi
- z_score
sensitivity: medium
test_params:
ks_test:
alpha: 0.05
psi:
buckets: 15
threshold: 0.2
What it detects:
- Distribution shape changes (KS test)
- Bucket-level shifts (PSI)
- Mean shifts (Z-score)
Example 2: Categorical Columns
drift_detection:
strategy: statistical
statistical:
tests:
- chi_square
- entropy
- top_k
sensitivity: high
test_params:
chi_square:
alpha: 0.01
top_k:
k: 15
similarity_threshold: 0.8
What it detects:
- Category distribution changes (Chi-square)
- Entropy/uniformity changes (Entropy)
- Top category stability (Top-K)
Example 3: Comprehensive Coverage
drift_detection:
strategy: statistical
statistical:
tests:
- ks_test
- psi
- z_score
- chi_square
- entropy
- top_k
sensitivity: medium
What it detects: All types of drift for both numeric and categorical columns
Understanding Results
Test Result Aggregation
When multiple tests run, results are aggregated:
- Drift Detection: Any test detecting drift → overall drift detected
- Severity: Maximum severity across all tests
- Score: Average score across all tests
- Metadata: Detailed results from each test included
Example Output
report = detector.detect_drift("customers")
for drift in report.column_drifts:
if drift.drift_detected:
print(f"{drift.column_name}.{drift.metric_name}")
print(f" Severity: {drift.drift_severity}")
print(f" Tests run: {drift.metadata['test_results']}")
# Individual test results
for test_result in drift.metadata['test_results']:
print(f" {test_result['test']}: score={test_result['score']}, "
f"p_value={test_result['p_value']}, "
f"drift={test_result['drift_detected']}")
Metadata Structure
drift.metadata = {
'strategy': 'statistical',
'tests_run': ['ks_test', 'psi', 'z_score'],
'test_results': [
{
'test': 'ks_test',
'score': 0.25,
'p_value': 0.001,
'drift_detected': True,
'severity': 'high',
'metadata': {
'alpha': 0.05,
'statistic': 0.25,
'p_value': 0.001
}
},
# ... more test results
],
'aggregated_score': 0.18,
'sensitivity': 'medium'
}
Best Practices
1. Start with Default Configuration
drift_detection:
strategy: statistical
statistical:
tests:
- ks_test
- psi
- chi_square
sensitivity: medium
2. Enable Histograms
For best results with KS test and PSI:
profiling:
compute_histograms: true
histogram_bins: 10
3. Adjust Sensitivity Based on Your Needs
# Production: Lower sensitivity (fewer false positives)
sensitivity: low
# Development: Higher sensitivity (catch more issues)
sensitivity: high
4. Select Tests Based on Your Data
# Numeric-heavy dataset
tests:
- ks_test
- psi
- z_score
# Categorical-heavy dataset
tests:
- chi_square
- entropy
- top_k
# Mixed dataset
tests:
- ks_test
- psi
- chi_square
- top_k
5. Tune Test-Specific Parameters
test_params:
# More strict KS test
ks_test:
alpha: 0.01
# More buckets for finer PSI analysis
psi:
buckets: 20
threshold: 0.15
# Track more top categories
top_k:
k: 20
similarity_threshold: 0.8
Performance Considerations
- Multiple tests: Running more tests takes slightly longer, but tests run in parallel where possible
- Histogram data: Requires more storage but enables better detection
- Large datasets: Statistical tests are efficient and scale well
Troubleshooting
"No statistical tests could run"
Problem: Tests don't support the column type or metric, or insufficient data.
Solutions:
- Check column type is numeric or categorical
- Enable histograms:
compute_histograms: true - Ensure you have summary statistics (mean, stddev, etc.)
- System will fallback to threshold-based detection
"All tests fail"
Problem: Data format issues or missing dependencies.
Solutions:
- Check data is in expected format (histograms, distributions)
- Install scipy for better test accuracy:
pip install scipy - Check logs for specific error messages
"Too many false positives"
Problem: Sensitivity too high or thresholds too low.
Solutions:
- Lower sensitivity:
sensitivity: low - Increase test thresholds in
test_params - Remove more sensitive tests (e.g., remove
entropyif too noisy)
"Not detecting obvious drift"
Problem: Sensitivity too low or thresholds too high.
Solutions:
- Increase sensitivity:
sensitivity: high - Lower test thresholds in
test_params - Add more tests to the list
Comparison with Other Strategies
| Feature | Absolute Threshold | Standard Deviation | Statistical Tests |
|---|---|---|---|
| Ease of Use | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Statistical Rigor | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Distribution Awareness | ❌ | ❌ | ✅ |
| Categorical Support | ❌ | ❌ | ✅ |
| Data Requirements | Minimal | Summary stats | Histograms preferred |
| False Positives | Medium | Low | Very Low |
| Setup Complexity | Low | Medium | Medium |
See Also
- Drift Detection Guide - General drift detection documentation
- Configuration Examples - Example configurations
- Profiling Metrics - Available metrics