Data Integrity in dbt: A Migration Guide from Traditional SQL

Target Audience: Data professionals transitioning from traditional RDBMS systems (SQL Server, Oracle, PostgreSQL) to modern data stack with dbt, particularly those familiar with database constraints and seeking equivalent data integrity patterns.

Introduction: The Paradigm Shift

Moving from traditional database systems like Microsoft SQL Server to a modern data stack with dbt represents a fundamental shift in how we approach data integrity. Instead of relying on database-enforced constraints at the table level, dbt introduces a transformation-time validation approach that provides greater flexibility and transparency.

Traditional SQL Constraints vs. dbt Approach

Traditional SQL Constraint Purpose dbt Equivalent Implementation Method
PRIMARY KEY Unique identifier, not null Data Tests + Model Contracts unique + not_null tests
FOREIGN KEY Referential integrity Relationship Tests relationships test
UNIQUE No duplicate values Unique Data Test unique test
NOT NULL Required values Not Null Data Test not_null test
CHECK Value validation Accepted Values Test accepted_values test

Core Concepts in dbt Data Integrity

1. Data Tests: The Foundation

In traditional SQL, constraints are enforced at insert/update time. In dbt, data tests are SQL queries that validate your data assumptions and can be run as part of your transformation workflow.

-- Traditional SQL Server Approach CREATE TABLE customers ( customer_id INT PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, status VARCHAR(20) CHECK (status IN ('active', 'inactive', 'suspended')), created_at DATETIME NOT NULL ); -- dbt Approach: schema.yml version: 2 models: - name: customers description: "Customer master table" columns: - name: customer_id description: "Unique customer identifier" data_tests: - unique - not_null - name: email description: "Customer email address" data_tests: - unique - not_null - name: status description: "Customer account status" data_tests: - accepted_values: values: ['active', 'inactive', 'suspended'] - name: created_at description: "Account creation timestamp" data_tests: - not_null

2. Model Contracts: Schema Enforcement

Model contracts in dbt provide upfront guarantees about your model's structure, similar to how table schemas work in traditional databases.

-- Traditional SQL Server: Schema is enforced at table level -- dbt: Schema is enforced through model contracts -- models/customers.sql {{ config( contract="enforce" ) }} version: 2 models: - name: customers config: contract: enforced: true columns: - name: customer_id data_type: int constraints: - type: not_null - type: primary_key - name: email data_type: varchar(255) constraints: - type: not_null - type: unique

3. Referential Integrity with Relationships Tests

Foreign key relationships in dbt are validated through relationships tests, providing the same integrity guarantees as traditional FOREIGN KEY constraints.

-- Traditional SQL Server ALTER TABLE orders ADD CONSTRAINT FK_orders_customers FOREIGN KEY (customer_id) REFERENCES customers(customer_id); -- dbt Equivalent version: 2 models: - name: orders columns: - name: customer_id description: "Reference to customers table" data_tests: - relationships: to: ref('customers') field: customer_id

Migration Strategies

Phase 1: Assessment and Planning

Key Steps:

Phase 2: Implementing dbt Data Tests

-- Example migration from SQL Server to dbt -- SQL Server Table Definition /* CREATE TABLE products ( product_id INT IDENTITY(1,1) PRIMARY KEY, sku VARCHAR(50) UNIQUE NOT NULL, category_id INT FOREIGN KEY REFERENCES categories(category_id), price DECIMAL(10,2) CHECK (price > 0), status VARCHAR(20) DEFAULT 'active' CHECK (status IN ('active', 'discontinued')) ); */ -- dbt schema.yml equivalent version: 2 models: - name: products description: "Product master data" columns: - name: product_id description: "Auto-generated product identifier" data_tests: - unique - not_null - name: sku description: "Stock keeping unit" data_tests: - unique - not_null - name: category_id description: "Product category reference" data_tests: - not_null - relationships: to: ref('categories') field: category_id - name: price description: "Product price" data_tests: - not_null - dbt_utils.expression_is_true: expression: "> 0" - name: status description: "Product status" data_tests: - accepted_values: values: ['active', 'discontinued']

Phase 3: Advanced Patterns

Custom Tests for Complex Business Rules

-- tests/assert_price_consistency.sql -- Custom test equivalent to complex CHECK constraint SELECT product_id, sku, price, discount_price FROM {{ ref('products') }} WHERE discount_price >= price OR discount_price < 0

Multi-Column Constraints

-- Traditional SQL: Composite key -- ALTER TABLE order_items ADD CONSTRAINT PK_order_items -- PRIMARY KEY (order_id, product_id); -- dbt equivalent version: 2 models: - name: order_items data_tests: - dbt_utils.unique_combination_of_columns: combination_of_columns: - order_id - product_id

Pros and Cons Comparison

✅ dbt Advantages

❌ Trade-offs

Best Practices for Migration

1. Start with Critical Constraints

Priority Order:
  1. Primary key uniqueness and not-null checks
  2. Foreign key relationships
  3. Business-critical value validations
  4. Data type and format validations

2. Implement Progressive Validation

# dbt_project.yml - Configure test severity data_tests: warn_if: ">= 1" # Warning for any failures error_if: ">= 10" # Error if 10+ failures # Gradual rollout approach models: my_project: staging: +data_tests: +severity: warn marts: +data_tests: +severity: error

3. Leverage dbt Packages

# packages.yml packages: - package: dbt-labs/dbt_utils version: 1.1.1 - package: calogica/dbt_expectations version: 0.10.1 # Enhanced testing capabilities models: - name: customers columns: - name: email data_tests: - dbt_expectations.expect_column_values_to_match_regex: regex: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"

Advanced Migration Patterns

Hybrid Approach: Database + dbt Constraints

For organizations requiring both immediate enforcement and comprehensive testing:

-- Stage 1: Load with minimal database constraints CREATE TABLE raw_customers ( customer_id INT NOT NULL, -- Keep essential constraints email VARCHAR(255), -- Other columns without constraints ); -- Stage 2: dbt transformation with comprehensive testing -- models/staging/stg_customers.sql {{ config(materialized='table') }} SELECT customer_id, LOWER(TRIM(email)) as email, -- Data cleaning and validation FROM {{ source('raw', 'raw_customers') }} -- Stage 3: Comprehensive dbt tests ensure data quality

Data Quality Monitoring

-- tests/data_quality/customer_email_format.sql {{ config(severity='warn', store_failures=true) }} SELECT customer_id, email, 'Invalid email format' as failure_reason FROM {{ ref('customers') }} WHERE email NOT REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'

Implementation Checklist

Migration Checklist:
  1. Audit existing constraints - Document all current database constraints
  2. Map to dbt tests - Identify equivalent dbt testing patterns
  3. Implement core tests - Start with unique, not_null, relationships
  4. Add custom validations - Implement business rule tests
  5. Configure CI/CD - Integrate tests into deployment pipeline
  6. Monitor and iterate - Establish data quality monitoring
  7. Documentation - Document testing strategy and patterns
  8. Training - Educate team on new testing paradigms

Conclusion

Migrating from traditional SQL constraints to dbt's data testing approach represents a shift from prevention-based to detection-based data integrity. While this requires adjusting your mental model, it offers significant advantages in flexibility, maintainability, and integration with modern data workflows.

The key to successful migration is understanding that dbt tests provide equivalent or superior data quality guarantees while offering greater visibility and control over your data validation processes.

Code Examples and Practical Implementation

📁 Complete Code Examples Collection

Access comprehensive, production-ready code examples that demonstrate all concepts covered in this guide:

Core Implementation Files:

Advanced Patterns:

📋 Implementation Guide: Complete usage instructions and best practices