Baselinr Dashboard Backend
FastAPI backend that provides REST API endpoints for the Baselinr Dashboard frontend.
Features​
- RESTful API with FastAPI
- Connects to Baselinr storage database
- Pydantic models for request/response validation
- CORS enabled for frontend integration
- Async database operations
- Export functionality (JSON/CSV)
Installation​
pip install -r requirements.txt
Configuration​
Create a .env file:
BASELINR_DB_URL=postgresql://baselinr:baselinr@localhost:5433/baselinr
API_HOST=0.0.0.0
API_PORT=8000
CORS_ORIGINS=http://localhost:3000
Running​
Development​
python main.py
# or
uvicorn main:app --reload --host 0.0.0.0 --port 8000
Production​
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
API Documentation​
Once running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Endpoints​
Health Check​
GET /- Health check
Dashboard​
GET /api/dashboard/metrics- Aggregate metrics
Runs​
GET /api/runs- List profiling runsGET /api/runs/{run_id}- Get run details
Drift​
GET /api/drift- List drift alerts
Tables​
GET /api/tables/{table_name}/metrics- Get table metrics
Warehouses​
GET /api/warehouses- List warehouses
Export​
GET /api/export/runs- Export runs dataGET /api/export/drift- Export drift data
Sample Data Generator​
Generate sample data for testing:
python sample_data_generator.py
This creates:
- 100 profiling runs
- Column metrics
- Drift events
Database Schema​
Expects these tables from Baselinr Phase 1:
baselinr_runsbaselinr_resultsbaselinr_eventsbaselinr_table_state(new incremental metadata cache)
baselinr_table_state stores the last snapshot ID, decision, and staleness score per table. The incremental planner and Dagster sensors consult this table to decide whether to skip, partially profile, or fully scan each dataset.
Incremental Profiling Configuration​
Enable incremental profiling in your main Baselinr config:
incremental:
enabled: true
change_detection:
metadata_table: baselinr_table_state
partial_profiling:
allow_partition_pruning: true
max_partitions_per_run: 64
cost_controls:
enabled: true
max_rows_scanned: 100000000
fallback_strategy: sample # sample | defer | full
sample_fraction: 0.05
With this block enabled the CLI, sensors, and dashboard metrics automatically:
- Compare warehouse metadata (row counts, partition manifests, snapshot IDs) with
baselinr_table_state. - Skip runs when nothing changed (
profile_skipped_no_changeevents are emitted for observability). - Issue partial runs when detectors pinpoint specific partitions/batches.
- Downgrade to sampling or defer when cost guardrails would be exceeded.
- Persist the final decision, snapshot ID, and cost estimate back into
baselinr_table_stateso subsequent scheduler ticks stay in sync with the dashboard.
Development​
Adding New Endpoints​
- Define Pydantic model in
models.py - Add database query method in
database.py - Create endpoint in
main.py
Testing​
# TODO: Add pytest tests
pytest
Troubleshooting​
Database Connection​
# Test connection
psql "postgresql://baselinr:baselinr@localhost:5433/baselinr"
Check Tables​
\dt baselinr*
Orchestration Integrations​
Looking to run profiling plans from Dagster? See docs/dashboard/backend/DAGSTER.md for a full guide on the new baselinr.integrations.dagster package (assets, sensors, and helper definitions).