Prompt Engineering Guide
This guide explains how prompts are constructed for LLM explanations and how they can be customized.
Prompt Structure
Each alert type (drift, anomaly, schema change) has a dedicated prompt construction function that formats technical details into a clear prompt for the LLM.
System Prompt
All explanations use a consistent system prompt that establishes the LLM's role:
You are a data quality analyst explaining anomalies to data engineers and business users.
Your goal is to:
1. Clearly explain what happened in plain English
2. Provide context about why this might have occurred
3. Suggest potential next steps or areas to investigate
4. Be concise (2-4 sentences maximum)
5. Avoid jargon unless necessary
Format your response as a clear, actionable explanation.
Drift Detection Prompts
Drift prompts include:
- Table and column names
- Metric name and values (baseline vs current)
- Change percentage or absolute change
- Severity level
- Timestamps
- Statistical test results (if available)
Example Prompt:
A data drift alert was detected:
Table: orders
Column: order_amount
Alert Type: Statistical Drift
Metric: mean
Severity: HIGH
Current value: 127.50
Baseline value: 98.20
Change: +30.00%
Baseline time: 2025-01-14T14:30:00
Current time: 2025-01-15T14:30:00
Test: Kolmogorov-Smirnov test
p-value: 0.003
Explain this drift in 2-4 clear sentences for a data engineer.
Anomaly Detection Prompts
Anomaly prompts include:
- Table and column names
- Expected vs actual values
- Deviation score
- Anomaly type and detection method
- Severity level
- Method-specific context (control limits, IQR, etc.)
Example Prompt:
An anomaly was detected:
Table: orders
Column: order_amount
Metric: mean
Anomaly Type: control_limit_breach
Detection Method: control_limits
Severity: HIGH
Expected value: 100.0
Actual value: 150.0
Deviation: 2.50 standard deviations from expected
Control limits: [80.00, 120.00]
Explain this anomaly in 2-4 clear sentences for a data engineer.
Schema Change Prompts
Schema change prompts include:
- Table name
- Change type (column_added, column_removed, type_changed, etc.)
- Column name (if applicable)
- Type changes (old → new)
- Severity level
Example Prompt:
A schema change was detected:
Table: orders
Change Type: column_added
Severity: MEDIUM
Column: new_column
Type change: None → varchar(255)
A new column was added to the table.
Explain the impact of this schema change in 2-4 clear sentences for a data engineer.
Customization (Future)
Currently, prompts are fixed. Future versions may support:
- Custom system prompts
- Prompt templates per alert type
- User-defined prompt variables
Prompt Best Practices
- Include context - More context leads to better explanations
- Be specific - Include exact values, not just "changed"
- Include timestamps - Helps LLM understand temporal context
- Include severity - Guides LLM on explanation tone
- Keep it concise - LLM is instructed to be brief (2-4 sentences)
Token Usage
Typical prompt sizes:
- Drift prompts: ~200-300 tokens
- Anomaly prompts: ~150-250 tokens
- Schema change prompts: ~100-150 tokens
Response sizes:
- Explanations: ~50-150 tokens (target: 2-4 sentences)
Total per explanation: ~250-450 tokens