Quality Score Cards Guide
Overview
Quality Score Cards provide a unified, easy-to-understand view of data quality across tables, columns, and the entire system. Each score card combines multiple quality dimensions into a single actionable score (0-100) that helps you quickly identify and prioritize data quality issues.
Understanding Scores
Overall Score
The overall quality score is a weighted combination of six quality dimensions:
- Completeness (25%): Based on null ratios across columns
- Validity (25%): Based on validation rule pass rates
- Consistency (20%): Based on drift detection and schema stability
- Freshness (15%): Based on data recency and update frequency
- Uniqueness (10%): Based on duplicate detection and unique constraints
- Accuracy (5%): Based on anomaly detection and statistical outliers
Score Levels
- Table-level: Score for individual tables
- Column-level: Score for individual columns (optional)
- Schema-level: Aggregated score for all tables in a schema
- System-level: Overall score across all monitored tables
Score Status
Scores are classified into three status levels:
- Healthy (≥80): Data quality is good, no immediate action needed
- Warning (60-79): Data quality issues detected, review recommended
- Critical (<60): Significant data quality problems, immediate attention required
Configuration
Component Weights
You can customize the weights assigned to each quality dimension. Weights must sum to 100%.
Default weights:
- Completeness: 25%
- Validity: 25%
- Consistency: 20%
- Freshness: 15%
- Uniqueness: 10%
- Accuracy: 5%
Example configuration:
quality_scoring:
enabled: true
weights:
completeness: 30
validity: 30
consistency: 20
freshness: 10
uniqueness: 5
accuracy: 5
Thresholds
Configure the score thresholds for status classification:
quality_scoring:
thresholds:
healthy: 80 # Scores >= 80 are healthy
warning: 60 # Scores >= 60 are warnings
critical: 0 # Scores < 60 are critical
Freshness Settings
Configure freshness thresholds in hours:
quality_scoring:
freshness:
excellent: 24 # ≤ 24 hours = 100 points
good: 48 # ≤ 48 hours = 80 points
acceptable: 168 # ≤ 1 week = 60 points
History Settings
Enable historical tracking of scores:
quality_scoring:
store_history: true
history_retention_days: 90 # Keep 90 days of history
CLI Usage
Basic Score Calculation
Calculate a quality score for a specific table:
baselinr score --config config.yaml --table customers
Output Formats
Table format (default):
baselinr score --config config.yaml --table customers --format table
JSON format:
baselinr score --config config.yaml --table customers --format json
Export Scores
Export single score to CSV:
baselinr score --config config.yaml --table customers --export csv --output scores.csv
Export single score to JSON:
baselinr score --config config.yaml --table customers --export json --output score.json
Export score history:
baselinr score --config config.yaml --table customers --history --export csv --output history.csv
Schema Filtering
Calculate scores for tables in a specific schema:
baselinr score --config config.yaml --table customers --schema public
API Usage
Get Table Score
GET /api/quality/scores/customers
Response:
{
"table_name": "customers",
"schema_name": "public",
"overall_score": 85.5,
"status": "healthy",
"trend": "improving",
"trend_percentage": 2.3,
"components": {
"completeness": 90.0,
"validity": 88.0,
"consistency": 82.0,
"freshness": 95.0,
"uniqueness": 85.0,
"accuracy": 78.0
},
"issues": {
"total": 3,
"critical": 1,
"warnings": 2
},
"calculated_at": "2024-01-15T10:30:00Z"
}
Get Score History
GET /api/quality/scores/customers/history?days=30
Get Schema-Level Scores
GET /api/quality/scores/schema/public
Get System-Level Score
GET /api/quality/scores/system
Alerting
Quality score alerts are automatically emitted when:
- Score Degradation: Score drops by more than 5 points
- Threshold Breach: Score crosses warning or critical thresholds
Alert Configuration
Alerts are integrated with Baselinr's event system. Configure alert hooks in your configuration:
hooks:
enabled: true
hooks:
- type: slack
webhook_url: ${SLACK_WEBHOOK_URL}
channel: "#data-alerts"
min_severity: medium
alert_on_drift: true
alert_on_schema_change: true
Alert Events
QualityScoreDegraded: Emitted when score drops significantly
table: Table namecurrent_score: Current scoreprevious_score: Previous scorescore_change: Change in scorethreshold_type: 'warning' or 'critical'
QualityScoreThresholdBreached: Emitted when score crosses a threshold
table: Table namecurrent_score: Current scorethreshold_type: 'warning' or 'critical'threshold_value: Threshold value that was breached
Exporting Scores
CSV Export Format
CSV exports include the following columns:
table_nameschema_nameoverall_scorecompleteness_scorevalidity_scoreconsistency_scorefreshness_scoreuniqueness_scoreaccuracy_scorestatustotal_issuescritical_issueswarningscalculated_atperiod_startperiod_end
JSON Export Format
JSON exports include full score objects with all metadata:
[
{
"overall_score": 85.5,
"completeness_score": 90.0,
"validity_score": 88.0,
"consistency_score": 82.0,
"freshness_score": 95.0,
"uniqueness_score": 85.0,
"accuracy_score": 78.0,
"status": "healthy",
"total_issues": 3,
"critical_issues": 1,
"warnings": 2,
"table_name": "customers",
"schema_name": "public",
"calculated_at": "2024-01-15T10:30:00Z",
"period_start": "2024-01-08T10:30:00Z",
"period_end": "2024-01-15T10:30:00Z"
}
]
Best Practices
1. Regular Monitoring
Calculate scores regularly as part of your data pipeline:
# Add to your cron job or workflow
baselinr score --config config.yaml --table customers
2. Set Appropriate Thresholds
Adjust thresholds based on your data quality requirements:
- Strict environments: Set healthy threshold to 90
- Development environments: Set healthy threshold to 70
3. Monitor Trends
Track score trends over time to identify gradual degradation:
baselinr score --config config.yaml --table customers --history --export json --output trends.json
4. Configure Alerts
Set up alert hooks to be notified of score degradation:
hooks:
enabled: true
hooks:
- type: slack
webhook_url: ${SLACK_WEBHOOK_URL}
channel: "#data-quality"
5. Customize Weights
Adjust component weights based on your priorities:
- Data completeness critical: Increase completeness weight
- Validation important: Increase validity weight
- Freshness matters: Increase freshness weight
Troubleshooting
Score Not Calculating
Problem: Score command returns no results or errors
Solutions:
-
Verify quality scoring is enabled in config:
quality_scoring:
enabled: true -
Check that required tables exist:
baselinr_results(profiling results)baselinr_validation_results(validation results)baselinr_events(drift/anomaly events)
-
Verify table has been profiled:
baselinr query runs --table customers
Scores Seem Incorrect
Problem: Scores don't match expectations
Solutions:
-
Check component scores individually:
baselinr score --table customers --format json -
Verify data exists for all components:
- Profiling results for completeness/uniqueness
- Validation results for validity
- Events for consistency/accuracy
-
Check freshness calculation:
- Verify
baselinr_runstable has recent entries - Check
profiled_attimestamps
- Verify
Export Fails
Problem: Export command fails or produces empty files
Solutions:
- Verify output path is writable
- Check file permissions
- Ensure table has scores:
baselinr score --table customers
Alerts Not Firing
Problem: Score degradation doesn't trigger alerts
Solutions:
-
Verify hooks are enabled:
hooks:
enabled: true -
Check event bus is initialized in score command
-
Verify alert thresholds are configured correctly
-
Check hook logs for errors
Related Documentation
- Data Validation Guide
- Drift Detection Guide
- Anomaly Detection Guide
- Event & Alert Hooks
- Configuration Reference
Examples
Example 1: Daily Score Check
#!/bin/bash
# Daily quality score check
TABLES=("customers" "orders" "products")
for table in "${TABLES[@]}"; do
baselinr score --config config.yaml --table "$table" --export csv --output "scores_${table}_$(date +%Y%m%d).csv"
done
Example 2: Score Monitoring Script
import subprocess
import json
def check_quality_scores(tables):
results = {}
for table in tables:
result = subprocess.run(
["baselinr", "score", "--config", "config.yaml",
"--table", table, "--format", "json"],
capture_output=True,
text=True
)
score = json.loads(result.stdout)
results[table] = score
if score["status"] == "critical":
print(f"ALERT: {table} has critical quality score: {score['overall_score']}")
return results
Example 3: Trend Analysis
# Export 90 days of history
baselinr score --config config.yaml --table customers --history --export json --output history.json
# Analyze trends (using jq)
cat history.json | jq '.[] | {date: .calculated_at, score: .overall_score}'