Demo Data for Baselinr Quality Studio
This directory contains pre-generated demo data for the Cloudflare Pages deployment of the Quality Studio.
Generated Files
Data Files
runs.json(43.3 KB) - 120 profiling runs across multiple warehouses and schemas spanning 60 daysmetrics.json(172.7 KB) - 582 column-level metrics for successful runsdrift_events.json(20.4 KB) - 54 drift detection events with varying severity levelstables.json(33.9 KB) - Metadata for 95 unique tablesvalidation_results.json(61.1 KB) - 156 validation results with ~80% pass ratelineage.json(8.3 KB) - Lineage graph with 16 nodes and 14 edges
Metadata
metadata.json(0.6 KB) - Generation metadata, statistics, and date ranges
Data Structure
runs.json
{
"run_id": "run_xxx",
"dataset_name": "customers",
"schema_name": "raw",
"warehouse_type": "snowflake",
"profiled_at": "2025-12-16T00:00:00+00:00",
"status": "completed",
"row_count": 1000000,
"column_count": 8,
"duration_seconds": 45.2,
"environment": "production",
"has_drift": false
}
metrics.json
{
"run_id": "run_xxx",
"column_name": "email",
"column_type": "VARCHAR",
"null_count": 150,
"null_percent": 0.15,
"distinct_count": 999000,
"distinct_percent": 99.9,
"min_value": "[email protected]",
"max_value": "[email protected]"
}
drift_events.json
{
"event_id": "evt_xxx",
"run_id": "run_xxx",
"table_name": "customers",
"column_name": "email",
"metric_name": "null_percent",
"baseline_value": 0.15,
"current_value": 0.45,
"change_percent": 200.0,
"severity": "high",
"timestamp": "2025-12-16T00:00:00+00:00",
"warehouse_type": "snowflake"
}
tables.json
{
"table_name": "customers",
"schema_name": "raw",
"warehouse_type": "snowflake",
"last_profiled": "2025-12-16T00:00:00+00:00",
"row_count": 1000000,
"column_count": 8,
"total_runs": 5,
"drift_count": 2,
"validation_pass_rate": 0.85,
"has_recent_drift": true,
"has_failed_validations": false
}
validation_results.json
{
"id": 1,
"run_id": "run_xxx",
"table_name": "customers",
"schema_name": "raw",
"column_name": "email",
"rule_type": "format",
"passed": false,
"failure_reason": "Found 50 values not matching expected format in email",
"total_rows": 1000000,
"failed_rows": 50,
"failure_rate": 0.00005,
"severity": "medium",
"validated_at": "2025-12-16T00:00:00+00:00"
}
lineage.json
{
"nodes": [
{
"id": "raw.customers",
"type": "table",
"label": "raw.customers",
"schema": "raw",
"table": "customers",
"database": null,
"metadata": {
"warehouse_type": "snowflake",
"row_count": 1000000,
"column_count": 8
}
}
],
"edges": [
{
"source": "raw.customers",
"target": "staging.customers",
"relationship_type": "derives_from",
"confidence": 0.95,
"transformation": null,
"provider": "manual",
"metadata": {}
}
],
"root_id": null,
"direction": "both"
}
Statistics
- Date Range: 60 days (Oct 17, 2025 - Dec 16, 2025)
- Warehouses: Snowflake, BigQuery, PostgreSQL, Redshift
- Schemas: raw, staging, analytics, production
- Total Data Size: ~340 KB (all JSON files combined)
Regenerating Data
To regenerate the demo data:
cd dashboard/backend
python generate_demo_data.py
This will:
- Generate new profiling runs, metrics, and events
- Validate data consistency
- Export to JSON files in this directory
- Update metadata.json with generation timestamp
Data Characteristics
- Runs: Mix of completed (65%), success (20%), failed (10%), and running (5%) statuses
- Drift Events: ~30% of successful runs have drift, with 50% low, 35% medium, 15% high severity
- Validations: ~80% pass rate across 6 rule types (not_null, unique, range, format, enum, referential)
- Lineage: Realistic data pipeline (raw → staging → analytics/production)
- Metrics: Realistic distributions based on column types (IDs have 0% nulls, emails have <3% nulls, etc.)
Usage in Demo
These files are loaded by the demo data service (demo_data.py) and served through Cloudflare Pages Functions to provide a fully functional demo of the Quality Studio without requiring a database backend.