Data Integrity in dbt: A Migration Guide from Traditional SQL
Target Audience: Data professionals transitioning from traditional RDBMS systems (SQL Server, Oracle, PostgreSQL) to modern data stack with dbt, particularly those familiar with database constraints and seeking equivalent data integrity patterns.
Introduction: The Paradigm Shift
Moving from traditional database systems like Microsoft SQL Server to a modern data stack with dbt represents a fundamental shift in how we approach data integrity. Instead of relying on database-enforced constraints at the table level, dbt introduces a transformation-time validation approach that provides greater flexibility and transparency.
Traditional SQL Constraints vs. dbt Approach
Traditional SQL Constraint |
Purpose |
dbt Equivalent |
Implementation Method |
PRIMARY KEY |
Unique identifier, not null |
Data Tests + Model Contracts |
unique + not_null tests |
FOREIGN KEY |
Referential integrity |
Relationship Tests |
relationships test |
UNIQUE |
No duplicate values |
Unique Data Test |
unique test |
NOT NULL |
Required values |
Not Null Data Test |
not_null test |
CHECK |
Value validation |
Accepted Values Test |
accepted_values test |
Core Concepts in dbt Data Integrity
1. Data Tests: The Foundation
In traditional SQL, constraints are enforced at insert/update time. In dbt, data tests are SQL queries that validate your data assumptions and can be run as part of your transformation workflow.
-- Traditional SQL Server Approach
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
status VARCHAR(20) CHECK (status IN ('active', 'inactive', 'suspended')),
created_at DATETIME NOT NULL
);
-- dbt Approach: schema.yml
version: 2
models:
- name: customers
description: "Customer master table"
columns:
- name: customer_id
description: "Unique customer identifier"
data_tests:
- unique
- not_null
- name: email
description: "Customer email address"
data_tests:
- unique
- not_null
- name: status
description: "Customer account status"
data_tests:
- accepted_values:
values: ['active', 'inactive', 'suspended']
- name: created_at
description: "Account creation timestamp"
data_tests:
- not_null
2. Model Contracts: Schema Enforcement
Model contracts in dbt provide upfront guarantees about your model's structure, similar to how table schemas work in traditional databases.
-- Traditional SQL Server: Schema is enforced at table level
-- dbt: Schema is enforced through model contracts
-- models/customers.sql
{{
config(
contract="enforce"
)
}}
version: 2
models:
- name: customers
config:
contract:
enforced: true
columns:
- name: customer_id
data_type: int
constraints:
- type: not_null
- type: primary_key
- name: email
data_type: varchar(255)
constraints:
- type: not_null
- type: unique
3. Referential Integrity with Relationships Tests
Foreign key relationships in dbt are validated through relationships tests, providing the same integrity guarantees as traditional FOREIGN KEY constraints.
-- Traditional SQL Server
ALTER TABLE orders
ADD CONSTRAINT FK_orders_customers
FOREIGN KEY (customer_id) REFERENCES customers(customer_id);
-- dbt Equivalent
version: 2
models:
- name: orders
columns:
- name: customer_id
description: "Reference to customers table"
data_tests:
- relationships:
to: ref('customers')
field: customer_id
Migration Strategies
Phase 1: Assessment and Planning
Key Steps:
- Inventory existing constraints in your traditional database
- Identify critical business rules enforced by constraints
- Map constraint types to dbt test equivalents
- Plan testing strategy and validation workflows
Phase 2: Implementing dbt Data Tests
-- Example migration from SQL Server to dbt
-- SQL Server Table Definition
/*
CREATE TABLE products (
product_id INT IDENTITY(1,1) PRIMARY KEY,
sku VARCHAR(50) UNIQUE NOT NULL,
category_id INT FOREIGN KEY REFERENCES categories(category_id),
price DECIMAL(10,2) CHECK (price > 0),
status VARCHAR(20) DEFAULT 'active' CHECK (status IN ('active', 'discontinued'))
);
*/
-- dbt schema.yml equivalent
version: 2
models:
- name: products
description: "Product master data"
columns:
- name: product_id
description: "Auto-generated product identifier"
data_tests:
- unique
- not_null
- name: sku
description: "Stock keeping unit"
data_tests:
- unique
- not_null
- name: category_id
description: "Product category reference"
data_tests:
- not_null
- relationships:
to: ref('categories')
field: category_id
- name: price
description: "Product price"
data_tests:
- not_null
- dbt_utils.expression_is_true:
expression: "> 0"
- name: status
description: "Product status"
data_tests:
- accepted_values:
values: ['active', 'discontinued']
Phase 3: Advanced Patterns
Custom Tests for Complex Business Rules
-- tests/assert_price_consistency.sql
-- Custom test equivalent to complex CHECK constraint
SELECT
product_id,
sku,
price,
discount_price
FROM {{ ref('products') }}
WHERE discount_price >= price
OR discount_price < 0
Multi-Column Constraints
-- Traditional SQL: Composite key
-- ALTER TABLE order_items ADD CONSTRAINT PK_order_items
-- PRIMARY KEY (order_id, product_id);
-- dbt equivalent
version: 2
models:
- name: order_items
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- order_id
- product_id
Pros and Cons Comparison
✅ dbt Advantages
- Flexibility: Tests can be modified without table locks
- Transparency: All validation logic is version-controlled
- CI/CD Integration: Tests run as part of deployment pipeline
- Custom Logic: Complex business rules easily implemented
- Data Quality Insights: Failed tests provide actionable data quality reports
- Documentation: Tests serve as living documentation
❌ Trade-offs
- Runtime Enforcement: No immediate rejection of invalid data
- Resource Usage: Tests consume compute resources
- Platform Differences: Constraint enforcement varies by data warehouse
- Learning Curve: New testing paradigm to master
- Timing: Issues discovered after data loading
Best Practices for Migration
1. Start with Critical Constraints
Priority Order:
- Primary key uniqueness and not-null checks
- Foreign key relationships
- Business-critical value validations
- Data type and format validations
2. Implement Progressive Validation
# dbt_project.yml - Configure test severity
data_tests:
warn_if: ">= 1" # Warning for any failures
error_if: ">= 10" # Error if 10+ failures
# Gradual rollout approach
models:
my_project:
staging:
+data_tests:
+severity: warn
marts:
+data_tests:
+severity: error
3. Leverage dbt Packages
# packages.yml
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: calogica/dbt_expectations
version: 0.10.1
# Enhanced testing capabilities
models:
- name: customers
columns:
- name: email
data_tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
Advanced Migration Patterns
Hybrid Approach: Database + dbt Constraints
For organizations requiring both immediate enforcement and comprehensive testing:
-- Stage 1: Load with minimal database constraints
CREATE TABLE raw_customers (
customer_id INT NOT NULL, -- Keep essential constraints
email VARCHAR(255),
-- Other columns without constraints
);
-- Stage 2: dbt transformation with comprehensive testing
-- models/staging/stg_customers.sql
{{ config(materialized='table') }}
SELECT
customer_id,
LOWER(TRIM(email)) as email,
-- Data cleaning and validation
FROM {{ source('raw', 'raw_customers') }}
-- Stage 3: Comprehensive dbt tests ensure data quality
Data Quality Monitoring
-- tests/data_quality/customer_email_format.sql
{{ config(severity='warn', store_failures=true) }}
SELECT
customer_id,
email,
'Invalid email format' as failure_reason
FROM {{ ref('customers') }}
WHERE email NOT REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
Implementation Checklist
Migration Checklist:
- ✅ Audit existing constraints - Document all current database constraints
- ✅ Map to dbt tests - Identify equivalent dbt testing patterns
- ✅ Implement core tests - Start with unique, not_null, relationships
- ✅ Add custom validations - Implement business rule tests
- ✅ Configure CI/CD - Integrate tests into deployment pipeline
- ✅ Monitor and iterate - Establish data quality monitoring
- ✅ Documentation - Document testing strategy and patterns
- ✅ Training - Educate team on new testing paradigms
Conclusion
Migrating from traditional SQL constraints to dbt's data testing approach represents a shift from prevention-based to detection-based data integrity. While this requires adjusting your mental model, it offers significant advantages in flexibility, maintainability, and integration with modern data workflows.
The key to successful migration is understanding that dbt tests provide equivalent or superior data quality guarantees while offering greater visibility and control over your data validation processes.
Code Examples and Practical Implementation
📁 Complete Code Examples Collection
Access comprehensive, production-ready code examples that demonstrate all concepts covered in this guide:
Core Implementation Files:
📋 Implementation Guide:
Complete usage instructions and best practices