Quality Score Cards Guide

Overview

Quality Score Cards provide a unified, easy-to-understand view of data quality across tables, columns, and the entire system. Each score card combines multiple quality dimensions into a single actionable score (0-100) that helps you quickly identify and prioritize data quality issues.

Understanding Scores

Overall Score

The overall quality score is a weighted combination of six quality dimensions:

Completeness (25%): Based on null ratios across columns
Validity (25%): Based on validation rule pass rates
Consistency (20%): Based on drift detection and schema stability
Freshness (15%): Based on data recency and update frequency
Uniqueness (10%): Based on duplicate detection and unique constraints
Accuracy (5%): Based on anomaly detection and statistical outliers

Score Levels

Table-level: Score for individual tables
Column-level: Score for individual columns (optional)
Schema-level: Aggregated score for all tables in a schema
System-level: Overall score across all monitored tables

Score Status

Scores are classified into three status levels:

Healthy (≥80): Data quality is good, no immediate action needed
Warning (60-79): Data quality issues detected, review recommended
Critical (<60): Significant data quality problems, immediate attention required

Configuration

Component Weights

You can customize the weights assigned to each quality dimension. Weights must sum to 100%.

Default weights:

Completeness: 25%
Validity: 25%
Consistency: 20%
Freshness: 15%
Uniqueness: 10%
Accuracy: 5%

Example configuration:

quality_scoring:
  enabled: true
  weights:
    completeness: 30
    validity: 30
    consistency: 20
    freshness: 10
    uniqueness: 5
    accuracy: 5

Thresholds

Configure the score thresholds for status classification:

quality_scoring:
  thresholds:
    healthy: 80    # Scores >= 80 are healthy
    warning: 60    # Scores >= 60 are warnings
    critical: 0    # Scores < 60 are critical

Freshness Settings

Configure freshness thresholds in hours:

quality_scoring:
  freshness:
    excellent: 24      # ≤ 24 hours = 100 points
    good: 48          # ≤ 48 hours = 80 points
    acceptable: 168   # ≤ 1 week = 60 points

History Settings

Enable historical tracking of scores:

quality_scoring:
  store_history: true
  history_retention_days: 90  # Keep 90 days of history

CLI Usage

Basic Score Calculation

Calculate a quality score for a specific table:

baselinr score --config config.yaml --table customers

Output Formats

Table format (default):

baselinr score --config config.yaml --table customers --format table

JSON format:

baselinr score --config config.yaml --table customers --format json

Export Scores

Export single score to CSV:

baselinr score --config config.yaml --table customers --export csv --output scores.csv

Export single score to JSON:

baselinr score --config config.yaml --table customers --export json --output score.json

Export score history:

baselinr score --config config.yaml --table customers --history --export csv --output history.csv

Schema Filtering

Calculate scores for tables in a specific schema:

baselinr score --config config.yaml --table customers --schema public

API Usage

Get Table Score

GET /api/quality/scores/customers

Response:

{
  "table_name": "customers",
  "schema_name": "public",
  "overall_score": 85.5,
  "status": "healthy",
  "trend": "improving",
  "trend_percentage": 2.3,
  "components": {
    "completeness": 90.0,
    "validity": 88.0,
    "consistency": 82.0,
    "freshness": 95.0,
    "uniqueness": 85.0,
    "accuracy": 78.0
  },
  "issues": {
    "total": 3,
    "critical": 1,
    "warnings": 2
  },
  "calculated_at": "2024-01-15T10:30:00Z"
}

Get Score History

GET /api/quality/scores/customers/history?days=30

Get Schema-Level Scores

GET /api/quality/scores/schema/public

Get System-Level Score

GET /api/quality/scores/system

Alerting

Quality score alerts are automatically emitted when:

Score Degradation: Score drops by more than 5 points
Threshold Breach: Score crosses warning or critical thresholds

Alert Configuration

Alerts are integrated with Baselinr's event system. Configure alert hooks in your configuration:

hooks:
  enabled: true
  hooks:
    - type: slack
      webhook_url: ${SLACK_WEBHOOK_URL}
      channel: "#data-alerts"
      min_severity: medium
      alert_on_drift: true
      alert_on_schema_change: true

Alert Events

QualityScoreDegraded: Emitted when score drops significantly

table: Table name
current_score: Current score
previous_score: Previous score
score_change: Change in score
threshold_type: 'warning' or 'critical'

QualityScoreThresholdBreached: Emitted when score crosses a threshold

table: Table name
current_score: Current score
threshold_type: 'warning' or 'critical'
threshold_value: Threshold value that was breached

Exporting Scores

CSV Export Format

CSV exports include the following columns:

table_name
schema_name
overall_score
completeness_score
validity_score
consistency_score
freshness_score
uniqueness_score
accuracy_score
status
total_issues
critical_issues
warnings
calculated_at
period_start
period_end

JSON Export Format

JSON exports include full score objects with all metadata:

[
  {
    "overall_score": 85.5,
    "completeness_score": 90.0,
    "validity_score": 88.0,
    "consistency_score": 82.0,
    "freshness_score": 95.0,
    "uniqueness_score": 85.0,
    "accuracy_score": 78.0,
    "status": "healthy",
    "total_issues": 3,
    "critical_issues": 1,
    "warnings": 2,
    "table_name": "customers",
    "schema_name": "public",
    "calculated_at": "2024-01-15T10:30:00Z",
    "period_start": "2024-01-08T10:30:00Z",
    "period_end": "2024-01-15T10:30:00Z"
  }
]

Best Practices

1. Regular Monitoring

Calculate scores regularly as part of your data pipeline:

# Add to your cron job or workflow
baselinr score --config config.yaml --table customers

2. Set Appropriate Thresholds

Adjust thresholds based on your data quality requirements:

Strict environments: Set healthy threshold to 90
Development environments: Set healthy threshold to 70

3. Monitor Trends

Track score trends over time to identify gradual degradation:

baselinr score --config config.yaml --table customers --history --export json --output trends.json

4. Configure Alerts

Set up alert hooks to be notified of score degradation:

hooks:
  enabled: true
  hooks:
    - type: slack
      webhook_url: ${SLACK_WEBHOOK_URL}
      channel: "#data-quality"

5. Customize Weights

Adjust component weights based on your priorities:

Data completeness critical: Increase completeness weight
Validation important: Increase validity weight
Freshness matters: Increase freshness weight

Troubleshooting

Score Not Calculating

Problem: Score command returns no results or errors

Solutions:

Verify quality scoring is enabled in config:
```
quality_scoring:
  enabled: true
```
Check that required tables exist:
- baselinr_results (profiling results)
- baselinr_validation_results (validation results)
- baselinr_events (drift/anomaly events)
Verify table has been profiled:
```
baselinr query runs --table customers
```

Scores Seem Incorrect

Problem: Scores don't match expectations

Solutions:

Check component scores individually:

baselinr score --table customers --format json

Verify data exists for all components:
- Profiling results for completeness/uniqueness
- Validation results for validity
- Events for consistency/accuracy
Check freshness calculation:
- Verify baselinr_runs table has recent entries
- Check profiled_at timestamps

Export Fails

Problem: Export command fails or produces empty files

Solutions:

Verify output path is writable
Check file permissions
Ensure table has scores:
```
baselinr score --table customers
```

Alerts Not Firing

Problem: Score degradation doesn't trigger alerts

Solutions:

Verify hooks are enabled:
```
hooks:
  enabled: true
```
Check event bus is initialized in score command
Verify alert thresholds are configured correctly
Check hook logs for errors

Examples

Example 1: Daily Score Check

#!/bin/bash
# Daily quality score check

TABLES=("customers" "orders" "products")

for table in "${TABLES[@]}"; do
  baselinr score --config config.yaml --table "$table" --export csv --output "scores_${table}_$(date +%Y%m%d).csv"
done

Example 2: Score Monitoring Script

import subprocess
import json

def check_quality_scores(tables):
    results = {}
    for table in tables:
        result = subprocess.run(
            ["baselinr", "score", "--config", "config.yaml", 
             "--table", table, "--format", "json"],
            capture_output=True,
            text=True
        )
        score = json.loads(result.stdout)
        results[table] = score
        
        if score["status"] == "critical":
            print(f"ALERT: {table} has critical quality score: {score['overall_score']}")
    
    return results

Example 3: Trend Analysis

# Export 90 days of history
baselinr score --config config.yaml --table customers --history --export json --output history.json

# Analyze trends (using jq)
cat history.json | jq '.[] | {date: .calculated_at, score: .overall_score}'

Overview​

Understanding Scores​

Overall Score​

Score Levels​

Score Status​

Configuration​

Component Weights​

Thresholds​

Freshness Settings​

History Settings​

CLI Usage​

Basic Score Calculation​

Output Formats​

Export Scores​

Schema Filtering​

API Usage​

Get Table Score​

Get Score History​

Get Schema-Level Scores​

Get System-Level Score​

Alerting​

Alert Configuration​

Alert Events​

Exporting Scores​

CSV Export Format​

JSON Export Format​

Best Practices​

1. Regular Monitoring​

2. Set Appropriate Thresholds​

3. Monitor Trends​

4. Configure Alerts​

5. Customize Weights​

Troubleshooting​

Score Not Calculating​

Scores Seem Incorrect​

Export Fails​

Alerts Not Firing​

Related Documentation​

Examples​

Example 1: Daily Score Check​

Example 2: Score Monitoring Script​

Example 3: Trend Analysis​

Overview

Understanding Scores

Overall Score

Score Levels

Score Status

Configuration

Component Weights

Thresholds

Freshness Settings

History Settings

CLI Usage

Basic Score Calculation

Output Formats

Export Scores

Schema Filtering

API Usage

Get Table Score

Get Score History

Get Schema-Level Scores

Get System-Level Score

Alerting

Alert Configuration

Alert Events

Exporting Scores

CSV Export Format

JSON Export Format

Best Practices

1. Regular Monitoring

2. Set Appropriate Thresholds

3. Monitor Trends

4. Configure Alerts

5. Customize Weights

Troubleshooting

Score Not Calculating

Scores Seem Incorrect

Export Fails

Alerts Not Firing

Related Documentation

Examples

Example 1: Daily Score Check

Example 2: Score Monitoring Script

Example 3: Trend Analysis