ODCS Data Contracts Guide
This guide explains how to use Open Data Contract Standard (ODCS) contracts with Baselinr.
Overview
Baselinr supports the Open Data Contract Standard (ODCS) v3.1.0 for defining data contracts. This integration provides a standardized way to define:
- Dataset schemas and column definitions
- Data quality rules and validation
- Service Level Agreements (SLAs)
- Data ownership and stakeholders
- Access roles and permissions
Architecture
Baselinr uses a hybrid architecture where:
- Global config (
baselinr.yml) - Tool-specific settings (connections, execution, hooks, monitoring) - ODCS Contracts (
*.odcs.yaml) - Dataset definitions, quality rules, and SLAs
This separation provides:
- Portability: Contracts can be shared across tools
- Clarity: Clear distinction between "what the data is" vs "how the tool runs"
- Standards compliance: Industry-standard contract format
Directory Structure
project/
├── baselinr.yml # Tool configuration
└── contracts/ # ODCS data contracts
├── customers.odcs.yaml
├── orders.odcs.yaml
└── analytics/
└── metrics.odcs.yaml
Configuration
Enable Contracts
Add the contracts section to your baselinr.yml:
contracts:
directory: ./contracts # Path to contracts directory
file_patterns:
- "*.odcs.yaml"
- "*.odcs.yml"
recursive: true # Search subdirectories
validate_on_load: true # Validate contracts on load
strict_validation: false # Treat warnings as errors
Writing Contracts
Basic Contract Structure
kind: DataContract
apiVersion: v3.1.0
id: customers-contract
version: 1.0.0
status: active
info:
title: Customers Dataset
description: Core customer data
owner: data-[email protected]
domain: sales
servers:
production:
type: postgres
host: prod-db.company.com
database: production
schema: public
development:
type: postgres
host: localhost
database: development
dataset:
- name: customers
physicalName: public.customers
type: table
columns:
- name: customer_id
logicalType: integer
isPrimaryKey: true
isNullable: false
- name: email
logicalType: string
isNullable: false
classification: pii
quality:
- type: validity
dimension: completeness
specification:
column: email
rule: not_null
severity: error
servicelevels:
- property: freshness
value: 24
unit: hours
stakeholders:
- username: data-team
role: Data Owner
email: data-[email protected]
Column Types
Supported logical types:
string,text,integer,bigint,smallint,tinyintfloat,double,decimal,numericboolean,date,time,timestamp,timestamptzbinary,array,map,struct,json,uuidgeography,geometry,variant,object
Quality Rules
Define quality rules at contract, dataset, or column level:
quality:
# Not null check
- type: validity
dimension: completeness
specification:
column: customer_id
rule: not_null
severity: error
# Unique check
- type: validity
dimension: uniqueness
specification:
column: email
rule: unique
severity: error
# Format validation
- type: validity
dimension: validity
specification:
column: email
rule: format
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
severity: error
# Range check
- type: validity
dimension: validity
specification:
column: amount
rule: range
minValue: 0
maxValue: 1000000
severity: warning
# Enum validation
- type: validity
dimension: validity
specification:
column: status
rule: enum
values: [active, inactive, suspended]
severity: error
# Referential integrity
- type: validity
dimension: consistency
specification:
column: customer_id
rule: referential
referenceTable: customers
referenceColumn: id
severity: error
Service Level Agreements
Define SLAs for your data:
servicelevels:
- property: freshness
value: 24
unit: hours
description: Data should be updated daily
- property: availability
value: 99.9
unit: percent
description: Target availability
- property: latency
value: 5
unit: minutes
description: Max processing delay
- property: retention
value: 7
unit: years
description: Data retention requirement
Data Classification
Classify columns with PII or sensitive data:
columns:
- name: email
classification: pii
- name: ssn
classification: restricted
- name: credit_card
classification: pci
Supported classifications: public, internal, confidential, restricted, pii, phi, pci
CLI Commands
List Contracts
baselinr contracts list --config baselinr.yml
Validate Contracts
baselinr contracts validate --config baselinr.yml
baselinr contracts validate --config baselinr.yml --strict
Show Contract Details
baselinr contracts show --config baselinr.yml --contract customers-contract
baselinr contracts show --config baselinr.yml --contract customers-contract --format yaml
List Validation Rules
baselinr contracts rules --config baselinr.yml
baselinr contracts rules --config baselinr.yml --contract customers-contract
Python SDK
Load Contracts
from baselinr import BaselinrClient
client = BaselinrClient(config_path="baselinr.yml")
# Get all contracts
contracts = client.contracts
print(f"Loaded {len(contracts)} contracts")
# Get specific contract
contract = client.get_contract("customers-contract")
print(f"Contract: {contract.info.title}")
# Get dataset names
datasets = client.get_contract_datasets()
print(f"Datasets: {datasets}")
Validate Contracts
# Validate all contracts
result = client.validate_contracts()
if result['valid']:
print("All contracts valid!")
else:
for error in result['errors']:
print(f"Error: [{error['contract']}] {error['message']}")
Get Validation Rules
# Get rules from all contracts
rules = client.get_validation_rules_from_contracts()
for rule in rules:
print(f"{rule.type} on {rule.table}.{rule.column}")
Get Dataset Metadata
# Get metadata from contracts
metadata = client.get_dataset_metadata_from_contracts()
for ds in metadata:
print(f"{ds.name}: {len(ds.columns)} columns, owner: {ds.owner}")
Dashboard UI
Access the contracts UI at /config/contracts in the dashboard:
- View all loaded contracts
- Validate contracts
- See quality rules and SLAs
- View stakeholders and ownership
Best Practices
- One contract per domain - Group related tables in a single contract
- Use meaningful IDs -
customers-contractnotcontract-1 - Document ownership - Always specify
info.owner - Define SLAs - Set clear expectations for data freshness
- Classify sensitive data - Mark PII columns with
classification - Version your contracts - Use semantic versioning
- Store contracts in version control - Track changes over time