Baselinr API Reference
Complete API documentation for all public classes and methods in Baselinr.
Table of Contents
Python SDK Client
BaselinrClient
High-level Python SDK client providing unified access to all Baselinr functionality.
Location: baselinr.client.BaselinrClient
Constructor
BaselinrClient(
config_path: Optional[str] = None,
config: Optional[BaselinrConfig | Dict[str, Any]] = None
)
Initialize the Baselinr client.
Parameters:
config_path(Optional[str]): Path to configuration file (YAML or JSON)config(Optional[BaselinrConfig | Dict[str, Any]]): Configuration object or dictionary
Raises:
ValueError: If neither or bothconfig_pathandconfigare providedFileNotFoundError: Ifconfig_pathdoesn't exist
Example:
from baselinr import BaselinrClient
# Using config file
client = BaselinrClient(config_path="config.yml")
# Using config dictionary
client = BaselinrClient(config={"environment": "development", ...})
Properties
config: BaselinrConfig (read-only)
Access the cached configuration object.
Example:
environment = client.config.environment
source_type = client.config.source.type
Methods
plan(table_patterns=None, verbose=False) -> ProfilingPlan
Build execution plan without running profiling.
Parameters:
table_patterns(Optional[List[TablePattern]]): Optional list of tables to plan for (uses config if not provided)verbose(bool): Whether to include verbose details in plan (default: False)
Returns: ProfilingPlan object with execution details
Example:
plan = client.plan()
print(f"Will profile {plan.total_tables} tables")
print(f"Estimated metrics: {plan.estimated_metrics}")
# Access plan details
for table_plan in plan.tables:
print(f"{table_plan.full_name}: {table_plan.status}")
profile(table_patterns=None, dry_run=False, progress_callback=None) -> List[ProfilingResult]
Profile tables and write results to storage.
Parameters:
table_patterns(Optional[List[TablePattern]]): Optional list of tables to profile (uses config if not provided)dry_run(bool): If True, profile but don't write to storage (default: False)progress_callback(Optional[Callable]): Optional callback function(current, total, table_name) called when starting each table
Returns: List of ProfilingResult objects
Example:
# Simple profiling
results = client.profile()
# With progress callback
def progress(current, total, table_name):
print(f"Profiling {table_name} ({current}/{total})...")
results = client.profile(progress_callback=progress)
# Dry run (test without writing)
results = client.profile(dry_run=True)
# Access results
for result in results:
print(f"{result.dataset_name}:")
print(f" Run ID: {result.run_id}")
print(f" Columns: {len(result.columns)}")
print(f" Row count: {result.metadata.get('row_count')}")
detect_drift(dataset_name, baseline_run_id=None, current_run_id=None, schema_name=None) -> DriftReport
Detect drift between profiling runs.
Parameters:
dataset_name(str): Name of the dataset/tablebaseline_run_id(Optional[str]): Run ID to use as baseline (default: auto-selected based on strategy)current_run_id(Optional[str]): Run ID to compare (default: latest run)schema_name(Optional[str]): Optional schema name
Returns: DriftReport object with detected drift details
Example:
# Automatic baseline selection
report = client.detect_drift("customers")
# Specify runs explicitly
report = client.detect_drift(
dataset_name="customers",
baseline_run_id="abc-123",
current_run_id="def-456"
)
# Access drift information
print(f"Total drifts: {report.summary['total_drifts']}")
print(f"Schema changes: {len(report.schema_changes)}")
for drift in report.column_drifts:
if drift.drift_detected:
print(f"{drift.column_name}.{drift.metric_name}:")
print(f" Severity: {drift.drift_severity}")
print(f" Change: {drift.change_percent:.2f}%")
query_runs(schema=None, table=None, status=None, environment=None, days=None, limit=100, offset=0) -> List[RunSummary]
Query profiling runs with filters.
Parameters:
schema(Optional[str]): Filter by schema nametable(Optional[str]): Filter by table namestatus(Optional[str]): Filter by statusenvironment(Optional[str]): Filter by environmentdays(Optional[int]): Number of days to look backlimit(int): Maximum results to return (default: 100)offset(int): Pagination offset (default: 0)
Returns: List of RunSummary objects
Example:
# Recent runs
runs = client.query_runs(days=7, limit=10)
# Filter by table
runs = client.query_runs(table="customers", days=30)
# Pagination
page1 = client.query_runs(limit=10, offset=0)
page2 = client.query_runs(limit=10, offset=10)
# Access run information
for run in runs:
print(f"{run.dataset_name}: {run.profiled_at}")
print(f" Run ID: {run.run_id}")
print(f" Rows: {run.row_count}")
query_drift(table=None, schema=None, severity=None, days=7, limit=100, offset=0) -> List[DriftEvent]
Query drift events.
Parameters:
table(Optional[str]): Filter by table nameschema(Optional[str]): Filter by schema nameseverity(Optional[str]): Filter by severity (low/medium/high)days(int): Number of days to look back (default: 7)limit(int): Maximum results to return (default: 100)offset(int): Pagination offset (default: 0)
Returns: List of DriftEvent objects
Example:
# All drift events
drift_events = client.query_drift(days=7)
# High severity only
high_severity = client.query_drift(severity="high", days=7)
# Specific table
table_drift = client.query_drift(table="customers", days=30)
query_anomalies(table=None, schema=None, days=7, limit=100, offset=0) -> List[AnomalyEvent]
Query anomaly events.
Parameters:
table(Optional[str]): Filter by table nameschema(Optional[str]): Filter by schema namedays(int): Number of days to look back (default: 7)limit(int): Maximum results to return (default: 100)offset(int): Pagination offset (default: 0)
Returns: List of AnomalyEvent objects
query_run_details(run_id) -> Dict[str, Any]
Get detailed information about a specific profiling run.
Parameters:
run_id(str): Run ID to query
Returns: Dictionary with run details
query_table_history(table_name, schema_name=None, days=30, limit=100) -> List[Dict[str, Any]]
Get profiling history for a table over time.
Parameters:
table_name(str): Name of the tableschema_name(Optional[str]): Optional schema namedays(int): Number of days to look back (default: 30)limit(int): Maximum results to return (default: 100)
Returns: List of dictionaries with historical profiling data
get_status(drift_only=False) -> Dict[str, Any]
Get comprehensive status summary.
Parameters:
drift_only(bool): If True, only include drift summary (default: False)
Returns: Dictionary with status summary including runs and drift information
migrate_status() -> Dict[str, Any]
Get migration status.
Returns: Dictionary with current schema version and pending migrations
migrate_apply(dry_run=False) -> List[str]
Apply pending migrations.
Parameters:
dry_run(bool): If True, show what would be applied without applying (default: False)
Returns: List of applied migration IDs
migrate_validate() -> Dict[str, Any]
Validate migration state.
Returns: Dictionary with validation results
Configuration Classes
BaselinrConfig
Main Baselinr configuration class.
Location: baselinr.config.schema.BaselinrConfig
Fields:
environment(str): Environment name (development/test/production)source(ConnectionConfig): Source database connectionstorage(StorageConfig): Storage configurationprofiling(ProfilingConfig): Profiling configurationdrift_detection(DriftDetectionConfig): Drift detection configurationhooks(HooksConfig): Event hooks configurationmonitoring(MonitoringConfig): Monitoring configurationretry(RetryConfig): Retry configurationexecution(ExecutionConfig): Execution and parallelism configurationincremental(IncrementalConfig): Incremental profiling configurationschema_change(SchemaChangeConfig): Schema change detection configuration
For detailed configuration reference, see CONFIG_REFERENCE.md.
Data Classes
ProfilingPlan
Execution plan for profiling operations.
Location: baselinr.planner.ProfilingPlan
Fields:
run_id(str): Unique run identifiertimestamp(datetime): Plan timestampenvironment(str): Environment nametables(List[TablePlan]): List of tables to profilesource_type(str): Source database typesource_database(str): Source database namedrift_strategy(str): Drift detection strategytotal_tables(int): Total number of tablesestimated_metrics(int): Estimated number of metrics
Methods:
to_dict() -> Dict[str, Any]: Convert plan to dictionary
TablePlan
Plan for profiling a single table.
Location: baselinr.planner.TablePlan
Fields:
name(str): Table nameschema(Optional[str]): Schema namestatus(str): Table status (ready/pending/skipped)partition_config(Optional[Dict[str, Any]]): Partition configurationsampling_config(Optional[Dict[str, Any]]): Sampling configurationmetrics(List[str]): List of metrics to computemetadata(Dict[str, Any]): Additional metadata
Properties:
full_name(str): Fully qualified table name (schema.table)
ProfilingResult
Container for profiling results.
Location: baselinr.profiling.core.ProfilingResult
Fields:
run_id(str): Unique run identifierdataset_name(str): Dataset/table nameschema_name(Optional[str]): Schema nameprofiled_at(datetime): Profiling timestampcolumns(List[Dict[str, Any]]): Column metricsmetadata(Dict[str, Any]): Additional metadata
Methods:
add_column_metrics(column_name, column_type, metrics): Add metrics for a columnto_dict() -> Dict[str, Any]: Convert result to dictionary
DriftReport
Complete drift detection report.
Location: baselinr.drift.detector.DriftReport
Fields:
dataset_name(str): Dataset/table nameschema_name(Optional[str]): Schema namebaseline_run_id(str): Baseline run IDcurrent_run_id(str): Current run IDbaseline_timestamp(datetime): Baseline timestampcurrent_timestamp(datetime): Current timestampcolumn_drifts(List[ColumnDrift]): List of column driftsschema_changes(List[str]): List of schema changessummary(Dict[str, Any]): Summary statistics
Methods:
to_dict() -> Dict[str, Any]: Convert report to dictionary
ColumnDrift
Represents drift detected in a single column.
Location: baselinr.drift.detector.ColumnDrift
Fields:
column_name(str): Column namemetric_name(str): Metric namebaseline_value(Any): Baseline valuecurrent_value(Any): Current valuechange_percent(Optional[float]): Percentage changechange_absolute(Optional[float]): Absolute changedrift_detected(bool): Whether drift was detecteddrift_severity(str): Severity level (none/low/medium/high)metadata(Optional[Dict[str, Any]]): Additional metadata
Core Classes
PlanBuilder
Builds profiling execution plans from configuration.
Location: baselinr.planner.PlanBuilder
Methods:
build_plan() -> ProfilingPlan: Build execution plan from configuration
ProfileEngine
Main profiling engine for Baselinr.
Location: baselinr.profiling.core.ProfileEngine
Methods:
profile(table_patterns=None, progress_callback=None) -> List[ProfilingResult]: Profile tables
DriftDetector
Detects drift between profiling runs.
Location: baselinr.drift.detector.DriftDetector
Methods:
detect_drift(dataset_name, baseline_run_id=None, current_run_id=None, schema_name=None) -> DriftReport: Detect drift
MetadataQueryClient
Client for querying Baselinr metadata from storage.
Location: baselinr.query.client.MetadataQueryClient
Methods:
query_runs(...) -> List[RunSummary]: Query profiling runsquery_drift(...) -> List[DriftEvent]: Query drift eventsquery_anomalies(...) -> List[AnomalyEvent]: Query anomaly events
CLI Commands
baselinr plan
Build execution plan without running profiling.
Usage:
baselinr plan --config config.yml [--verbose]
baselinr profile
Profile tables and store results.
Usage:
baselinr profile --config config.yml [--dry-run] [--table TABLE]
baselinr drift
Detect drift between profiling runs.
Usage:
baselinr drift --config config.yml --dataset DATASET [--baseline-run-id ID] [--current-run-id ID]
baselinr query
Query profiling runs, drift events, or anomalies.
Usage:
baselinr query runs --config config.yml [--table TABLE] [--days DAYS] [--limit LIMIT]
baselinr query drift --config config.yml [--table TABLE] [--severity SEVERITY] [--days DAYS]
baselinr query anomalies --config config.yml [--table TABLE] [--days DAYS]
baselinr status
Get comprehensive status summary.
Usage:
baselinr status --config config.yml [--drift-only]
baselinr migrate
Manage schema migrations.
Usage:
baselinr migrate status --config config.yml
baselinr migrate apply --config config.yml [--dry-run]
baselinr migrate validate --config config.yml
baselinr ui
Start local web dashboard.
Usage:
baselinr ui --config config.yml [--host HOST] [--port PORT]
Related Documentation
- Configuration Reference - Complete configuration schema documentation
- Python SDK Guide - Comprehensive SDK usage guide
- CLI Documentation - CLI command reference