# packages.yml
packages:
- package: metaplane/dbt_expectations
version: 0.10.9
# dbt_project.yml
vars:
'dbt_date:time_zone': 'America/Los_Angeles'
# Install
dbt deps
Tests that validate the structure, schema, and overall shape of tables including row counts, column existence, and metadata validation.
column_name
(required): Name of the column to checkcolumn_index
(optional): Expected 1-based position of the columntransform
(optional, default='upper'): Case transformation for comparison
# Basic column existence check
models:
- name: customers
tests:
- dbt_expectations.expect_column_to_exist:
column_name: customer_id
# Validate column exists at specific position
- dbt_expectations.expect_column_to_exist:
column_name: email
column_index: 3
min_value
(optional): Minimum expected row countmax_value
(optional): Maximum expected row countstrictly
(optional, default=false): Use strict inequalities
models:
- name: daily_sales
tests:
# Expect at least 100 orders per day
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 100
max_value: 10000
# Strict bounds (exclusive)
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 0
max_value: 1000000
strictly: true
column_list
(required): List of expected column namestransform
(optional, default='upper'): Case transformation
models:
- name: user_profile
tests:
- dbt_expectations.expect_table_columns_to_match_set:
column_list: ['user_id', 'email', 'first_name', 'last_name', 'created_at']
# Case-sensitive matching
- dbt_expectations.expect_table_columns_to_match_set:
column_list: ['UserId', 'Email', 'CreatedAt']
transform: 'none'
compare_model
(required): Reference to comparison tablefactor
(optional): Multiplication factor for comparisoncolumn_name
(required): Timestamp column to checkdatepart
(required): Time unit (day, hour, etc.)interval
(required): Number of dateparts for recency check
models:
- name: user_events
tests:
# Check for data within last 24 hours
- dbt_expectations.expect_row_values_to_have_recent_data:
column_name: event_timestamp
datepart: hour
interval: 24
# Daily freshness check
- dbt_expectations.expect_row_values_to_have_recent_data:
column_name: order_date
datepart: day
interval: 1
Fundamental tests for column data validation including null checks, uniqueness, data types, and basic value constraints.
min_value
(optional): Minimum allowed valuemax_value
(optional): Maximum allowed valuestrictly
(optional, default=false): Use strict inequalitiesrow_condition
(optional): Filter condition
models:
- name: products
columns:
- name: price
tests:
# Price must be positive
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: true
- name: discount_percent
tests:
# Discount between 0-100%
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 100
- name: rating
tests:
# Only check active products
- dbt_expectations.expect_column_values_to_be_between:
min_value: 1.0
max_value: 5.0
row_condition: "status = 'active'"
value_set
(required): List of acceptable valuesquote_values
(optional, default=true): Quote string valuesrow_condition
(optional): Filter condition
models:
- name: orders
columns:
- name: status
tests:
- dbt_expectations.expect_column_values_to_be_in_set:
value_set: ['pending', 'processing', 'shipped', 'delivered', 'cancelled']
- name: priority
tests:
- dbt_expectations.expect_column_values_to_be_in_set:
value_set: [1, 2, 3]
quote_values: false
- name: payment_method
tests:
# Only check completed orders
- dbt_expectations.expect_column_values_to_be_in_set:
value_set: ['credit_card', 'paypal', 'bank_transfer']
row_condition: "status = 'completed'"
row_condition
(optional): Filter condition for subset testingrow_condition
(optional): Filter conditiontype_
(required): Expected data typerow_condition
(optional): Filter conditionsort_column
(optional): Column to sort by before checking orderstrictly
(optional, default=true): Enforce strict increasingdisplay_inconsistent_columns
(optional, default=false): Show problematic valuesStatistical and aggregate function tests for analyzing column distributions, counts, sums, and other mathematical properties.
min_value
(optional): Minimum expected meanmax_value
(optional): Maximum expected meangroup_by
(optional): Group by columns for per-group validationrow_condition
(optional): Filter conditionstrictly
(optional, default=false): Use strict inequalities
models:
- name: product_ratings
tests:
# Overall average rating should be reasonable
- dbt_expectations.expect_column_mean_to_be_between:
column_name: rating
min_value: 2.0
max_value: 4.5
# Per-category average ratings
- dbt_expectations.expect_column_mean_to_be_between:
column_name: rating
min_value: 3.0
max_value: 5.0
group_by: [category]
row_condition: "status = 'published'"
- name: order_amounts
tests:
# Average order value by customer tier
- dbt_expectations.expect_column_mean_to_be_between:
column_name: order_total
min_value: 50.0
group_by: [customer_tier]
value
(required): Expected distinct countgroup_by
(optional): Group by columnsrow_condition
(optional): Filter condition
models:
- name: employee_data
tests:
# Expect exactly 5 departments
- dbt_expectations.expect_column_distinct_count_to_equal:
column_name: department
value: 5
# Each department should have exactly 3 job levels
- dbt_expectations.expect_column_distinct_count_to_equal:
column_name: job_level
value: 3
group_by: [department]
- name: product_catalog
tests:
# Each category should have 10-20 distinct brands
- dbt_expectations.expect_column_distinct_count_to_be_between:
column_name: brand
min_value: 10
max_value: 20
group_by: [category]
min_value
(optional): Minimum expected summax_value
(optional): Maximum expected sumgroup_by
(optional): Group by columnsrow_condition
(optional): Filter conditionquantile
(required): Quantile to check (0.0 to 1.0)min_value
(optional): Minimum expected quantile valuemax_value
(optional): Maximum expected quantile valuevalue_set
(required): Set of acceptable most common valuestop_n
(optional, default=1): Number of top values to checkPattern matching and string validation tests using regular expressions, SQL LIKE patterns, and length constraints.
regex
(required): Regular expression patternis_raw
(optional, default=false): Treat regex as raw stringrow_condition
(optional): Filter condition
models:
- name: customers
columns:
- name: email
tests:
# Email format validation
- dbt_expectations.expect_column_values_to_match_regex:
regex: '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
- name: phone_number
tests:
# US phone number format: (XXX) XXX-XXXX
- dbt_expectations.expect_column_values_to_match_regex:
regex: '^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$'
- name: postal_code
tests:
# US ZIP code (5 digits or ZIP+4)
- dbt_expectations.expect_column_values_to_match_regex:
regex: '^[0-9]{5}(-[0-9]{4})?$'
row_condition: "country = 'US'"
- name: product_sku
tests:
# SKU format: 3 letters + 4 digits + 2 letters
- dbt_expectations.expect_column_values_to_match_regex:
regex: '^[A-Z]{3}[0-9]{4}[A-Z]{2}$'
like_pattern
(required): SQL LIKE pattern with % and _ wildcardsrow_condition
(optional): Filter condition
models:
- name: products
columns:
- name: product_code
tests:
# Product codes start with 'PROD-'
- dbt_expectations.expect_column_values_to_match_like_pattern:
like_pattern: 'PROD-%'
- name: batch_number
tests:
# Batch format: B + 4 digits + L + 2 digits
- dbt_expectations.expect_column_values_to_match_like_pattern:
like_pattern: 'B____L__'
- name: serial_number
tests:
# Check only active products
- dbt_expectations.expect_column_values_to_match_like_pattern:
like_pattern: 'SN-%-%'
row_condition: "status = 'active'"
min_value
(optional): Minimum string lengthmax_value
(optional): Maximum string lengthstrictly
(optional, default=false): Use strict inequalities
models:
- name: user_profiles
columns:
- name: username
tests:
# Username length 3-20 characters
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 3
max_value: 20
- name: password_hash
tests:
# Password hash must be exactly 64 characters
- dbt_expectations.expect_column_value_lengths_to_equal:
value: 64
- name: description
tests:
# Description at least 10 chars, max 500
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 10
max_value: 500
regex_list
(required): List of regex patternsmatch_on
(optional, default='any'): 'any' or 'all'Tests that validate relationships and constraints across multiple columns within the same row or across different rows.
column_A
(required): First column for comparisoncolumn_B
(required): Second column for comparisonor_equal
(optional, default=false): Allow equal valuesrow_condition
(optional): Filter condition
models:
- name: events
tests:
# End time must be after start time
- dbt_expectations.expect_column_pair_values_A_to_be_greater_than_B:
column_A: end_timestamp
column_B: start_timestamp
# Sale price >= cost (allow equal for break-even)
- dbt_expectations.expect_column_pair_values_A_to_be_greater_than_B:
column_A: sale_price
column_B: cost_price
or_equal: true
- name: financial_data
tests:
# Max credit limit > current balance for active accounts
- dbt_expectations.expect_column_pair_values_A_to_be_greater_than_B:
column_A: credit_limit
column_B: current_balance
row_condition: "account_status = 'active'"
column_A
(required): First columncolumn_B
(required): Second columnignore_row_if
(optional): Condition to ignore rows
models:
- name: order_calculations
tests:
# Calculated total should equal manual total
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: calculated_total
column_B: manual_total
# Backup email should match primary for verified users
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: primary_email
column_B: backup_email
ignore_row_if: "email_verified = false"
column_list
(required): List of columns to check for uniquenessignore_row_if
(optional): Condition to ignore rows
models:
- name: user_permissions
tests:
# Each user-role combination should be unique
- dbt_expectations.expect_compound_columns_to_be_unique:
column_list: ['user_id', 'role_id']
- name: product_pricing
tests:
# Product-region-date combination should be unique
- dbt_expectations.expect_compound_columns_to_be_unique:
column_list: ['product_id', 'region', 'effective_date']
- name: event_tracking
tests:
# User-session-event_type should be unique for completed events
- dbt_expectations.expect_compound_columns_to_be_unique:
column_list: ['user_id', 'session_id', 'event_type']
ignore_row_if: "event_status != 'completed'"
column_list
(required): List of columns to sumsum_total
(required): Expected sum valueignore_row_if
(optional): Condition to ignore rows
models:
- name: budget_allocation
tests:
# Budget percentages should sum to 100%
- dbt_expectations.expect_multicolumn_sum_to_equal:
column_list: ['marketing_pct', 'sales_pct', 'operations_pct', 'other_pct']
sum_total: 100
- name: financial_breakdown
tests:
# Revenue streams should equal total revenue
- dbt_expectations.expect_multicolumn_sum_to_equal:
column_list: ['product_revenue', 'service_revenue', 'other_revenue']
sum_total: total_revenue
- name: survey_responses
tests:
# Rating scores should sum to 20 for complete responses
- dbt_expectations.expect_multicolumn_sum_to_equal:
column_list: ['q1_score', 'q2_score', 'q3_score', 'q4_score']
sum_total: 20
ignore_row_if: "response_status != 'complete'"
column_list
(required): List of columns to check for within-row uniquenessignore_row_if
(optional): Condition to ignore rowsAdvanced statistical tests for detecting outliers, validating data distributions, and ensuring data completeness over time.
sigma_threshold
(required): Number of standard deviationstake_diff
(optional, default=false): Calculate differences from previous valuesgroup_by
(optional): Group by columns for per-group analysis
models:
- name: daily_sales
tests:
# Detect sales outliers (beyond 3 standard deviations)
- dbt_expectations.expect_column_values_to_be_within_n_stdevs:
column_name: daily_revenue
sigma_threshold: 3
# Check day-over-day changes for anomalies
- dbt_expectations.expect_column_values_to_be_within_n_stdevs:
column_name: daily_revenue
sigma_threshold: 2
take_diff: true
# Per-store outlier detection
- dbt_expectations.expect_column_values_to_be_within_n_stdevs:
column_name: daily_revenue
sigma_threshold: 2.5
group_by: ['store_id']
- name: user_behavior
tests:
# Detect unusual session durations
- dbt_expectations.expect_column_values_to_be_within_n_stdevs:
column_name: session_duration_minutes
sigma_threshold: 3
sigma_threshold
(required): Number of standard deviationstake_diff
(optional, default=false): Calculate differenceslookback_periods
(required): Number of periods for moving windowtrend_periods
(optional): Periods for trend calculation
models:
- name: website_traffic
tests:
# Detect traffic anomalies using 30-day moving window
- dbt_expectations.expect_column_values_to_be_within_n_moving_stdevs:
column_name: daily_visitors
sigma_threshold: 2
lookback_periods: 30
# Detect unusual day-over-day changes
- dbt_expectations.expect_column_values_to_be_within_n_moving_stdevs:
column_name: daily_visitors
sigma_threshold: 3
take_diff: true
lookback_periods: 14
- name: stock_prices
tests:
# Price anomaly detection with trend adjustment
- dbt_expectations.expect_column_values_to_be_within_n_moving_stdevs:
column_name: closing_price
sigma_threshold: 2.5
lookback_periods: 60
trend_periods: 20
column_name
(required): Date/timestamp columndatepart
(required): Date part (day, hour, week, etc.)interval
(optional, default=1): Interval between expected recordsgroup_by
(optional): Group by columns
models:
- name: daily_metrics
tests:
# Ensure data exists for every day
- dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
column_name: metric_date
datepart: day
# Check for data every week by region
- dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
column_name: metric_date
datepart: week
group_by: ['region']
- name: hourly_sensors
tests:
# Sensor data should be recorded every hour
- dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
column_name: reading_timestamp
datepart: hour
group_by: ['sensor_id']
# Check for data every 4 hours
- dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
column_name: reading_timestamp
datepart: hour
interval: 4
Category | Test Count | Primary Use Cases | Key Examples |
---|---|---|---|
Table Shape | 14 | Schema validation, row count checks, column existence, data freshness | expect_column_to_exist, expect_table_row_count_to_be_between |
Column Values | 11 | Basic data validation, null checks, uniqueness, data types, ranges | expect_column_values_to_be_between, expect_column_values_to_be_in_set |
Aggregate | 16 | Statistical analysis, distribution validation, summary statistics | expect_column_mean_to_be_between, expect_column_distinct_count_to_equal |
String Matching | 10 | Pattern validation, format checking, length constraints | expect_column_values_to_match_regex, expect_column_value_lengths_to_be_between |
Multi-column | 6 | Cross-column relationships, compound keys, data integrity | expect_column_pair_values_A_to_be_greater_than_B, expect_compound_columns_to_be_unique |
Distributional | 3 | Outlier detection, time series analysis, data completeness | expect_column_values_to_be_within_n_stdevs, expect_row_values_to_have_data_for_every_n_datepart |
row_condition
parameter to test subsets of data for more targeted validationgroup_by
parameters for per-category or per-segment validation
# Complex test with multiple parameters
- dbt_expectations.expect_column_values_to_be_between:
name: "validate_revenue_by_region"
column_name: monthly_revenue
min_value: 10000
max_value: 1000000
row_condition: "region in ('US', 'EU') and status = 'active'"
config:
severity: warn
error_if: ">5"
warn_if: ">0"
# Statistical test with grouping
- dbt_expectations.expect_column_mean_to_be_between:
name: "monitor_average_session_duration"
column_name: session_duration_minutes
min_value: 2.0
max_value: 30.0
group_by: ['device_type', 'user_segment']
config:
store_failures: true
For comprehensive information about dbt testing and data quality, refer to the official documentation:
These resources provide the authoritative source for dbt testing capabilities and are regularly updated.