dbt Modeling Guidelines and Best Practices

Overview: This guide provides comprehensive dbt modeling guidelines and best practices for building scalable, maintainable data transformation pipelines. Follow these patterns to create robust data models that serve your analytics needs effectively.

dbt Model Layers & Structure

1. Staging Layer (stg_)

🔧 Actions Performed in Staging:

📋 Staging Characteristics:

2. Intermediate Layer (int_)

⚙️ Actions Performed in Intermediate:

🔄 Intermediate Characteristics:

3. Marts Layer

🎯 Actions Performed in Marts:

📊 Marts Characteristics:

Why Layer Separation is Critical

🎯 The Philosophy Behind dbt Layers

The layered approach in dbt follows the principle of separation of concerns, where each layer has a specific responsibility and builds upon the previous one. This creates a maintainable, testable, and scalable data transformation pipeline.

Layer Primary Focus Key Benefits Reduces Risk Of
Staging Data Standardization • Consistent data types across sources
• Centralized source connection point
• Easy to debug data issues
• Source schema changes breaking downstream models
• Inconsistent data formatting
• Complex debugging
Intermediate Complex Logic Modularization • Reusable transformation components
• Easier testing of complex logic
• Faster development iteration
• Duplicated transformation logic
• Overly complex mart models
• Difficult maintenance
Marts Business Value Delivery • Optimized for end-user queries
• Business-friendly interface
• Performance-tuned storage
• Poor query performance
• Business logic mixed with technical details
• Difficulty in business rule changes

🔄 Data Flow and Transformation Logic

Raw Data → Staging → Intermediate → Marts → Analytics

Each layer can only reference models from the same layer or layers before it, creating a clear dependency flow.

Materialization Strategy

Layer Default Materialization Alternative Reasoning
Sources N/A N/A Raw data
Staging view ephemeral Lightweight, 1:1 with sources
Intermediate ephemeral view Temporary, reusable logic
Marts table incremental Performance for end users

Directory Structure (Recommended)

models/ ├── staging/ # Layer 1: Data Standardization │ ├── salesforce/ │ │ ├── _salesforce__sources.yml │ │ ├── _salesforce__models.yml │ │ ├── stg_salesforce__accounts.sql │ │ ├── stg_salesforce__contacts.sql │ │ └── stg_salesforce__opportunities.sql │ ├── stripe/ │ │ ├── _stripe__sources.yml │ │ ├── _stripe__models.yml │ │ ├── stg_stripe__customers.sql │ │ ├── stg_stripe__payments.sql │ │ └── stg_stripe__subscriptions.sql │ └── google_analytics/ │ ├── _ga__sources.yml │ ├── _ga__models.yml │ └── stg_ga__sessions.sql ├── intermediate/ # Layer 2: Complex Transformations │ ├── int_customer_segments.sql │ ├── int_orders_enriched.sql │ ├── int_revenue_calculations.sql │ └── int_user_journey_pivoted.sql ├── marts/ # Layer 3: Business-Ready Models │ ├── core/ # Company-wide metrics │ │ ├── dim_customers.sql │ │ ├── dim_products.sql │ │ ├── fct_orders.sql │ │ └── fct_payments.sql │ ├── marketing/ # Marketing-specific models │ │ ├── dim_campaigns.sql │ │ ├── fct_website_sessions.sql │ │ └── mart_marketing_attribution.sql │ ├── finance/ # Finance-specific models │ │ ├── fct_revenue.sql │ │ ├── mart_monthly_recurring_revenue.sql │ │ └── mart_customer_lifetime_value.sql │ └── product/ # Product analytics models │ ├── fct_user_events.sql │ └── mart_feature_adoption.sql └── utilities/ # Helper models and macros ├── all_dates.sql └── fiscal_calendar.sql

Implementation Guidelines

1. Set up your models directory structure:

models/ ├── staging/ # Layer 1: Data Standardization │ └── [source_system]/ │ ├── _[source]__sources.yml # Source definitions │ └── stg_[source]__[table].sql # Staging models ├── intermediate/ # Layer 2: Complex Transformations │ └── int_[description].sql # Reusable transformation logic └── marts/ # Layer 3: Business-Ready Models └── [business_area]/ ├── dim_[entity].sql # Dimension tables └── fct_[process].sql # Fact tables

2. Configure dbt_project.yml for layered approach:

models: your_project_name: staging: +materialized: view intermediate: +materialized: ephemeral marts: +materialized: table

3. Follow consistent naming conventions:

Practical Examples of Layer Transformations

📝 Staging Layer Examples

Example 1: Salesforce Accounts Staging

-- stg_salesforce__accounts.sql select id as account_id, name as account_name, type as account_type, industry, cast(annual_revenue as decimal(15,2)) as annual_revenue, cast(created_date as timestamp) as created_at, cast(last_modified_date as timestamp) as updated_at, upper(trim(billing_country)) as billing_country, case when is_deleted = 'true' then true else false end as is_deleted, md5(concat(id, last_modified_date)) as row_hash from {{ source('salesforce', 'accounts') }}

Example 2: Stripe Payments Staging

-- stg_stripe__payments.sql select id as payment_id, customer as customer_id, cast(amount as decimal(10,2)) / 100 as amount_usd, currency, status, cast(created as timestamp) as created_at, payment_method_types[0] as payment_method, lower(trim(description)) as description from {{ source('stripe', 'charges') }} where status != 'failed'

⚙️ Intermediate Layer Examples

Example 1: Customer Segmentation Logic

-- int_customer_segments.sql with customer_metrics as ( select customer_id, sum(amount_usd) as total_revenue, count(*) as order_count, max(created_at) as last_order_date, min(created_at) as first_order_date from {{ ref('stg_stripe__payments') }} group by customer_id ), customer_segments as ( select *, case when total_revenue >= 10000 then 'Enterprise' when total_revenue >= 1000 then 'Growth' when total_revenue >= 100 then 'Standard' else 'Basic' end as revenue_segment, case when date_diff(current_date, last_order_date, day) <= 30 then 'Active' when date_diff(current_date, last_order_date, day) <= 90 then 'At Risk' else 'Churned' end as activity_segment from customer_metrics ) select * from customer_segments

Example 2: Orders with Enriched Data

-- int_orders_enriched.sql select p.payment_id, p.customer_id, p.amount_usd, p.created_at as order_date, a.account_name, a.account_type, a.industry, a.billing_country, cs.revenue_segment, cs.activity_segment, row_number() over ( partition by p.customer_id order by p.created_at ) as customer_order_sequence from {{ ref('stg_stripe__payments') }} p left join {{ ref('stg_salesforce__accounts') }} a on p.customer_id = a.account_id left join {{ ref('int_customer_segments') }} cs on p.customer_id = cs.customer_id

🎯 Marts Layer Examples

Example 1: Customer Dimension Table

-- dim_customers.sql select customer_id, account_name as customer_name, account_type as customer_type, industry, billing_country as country, revenue_segment, activity_segment, first_order_date, last_order_date, total_revenue, order_count, round(total_revenue / order_count, 2) as avg_order_value, case when activity_segment = 'Active' then 'Healthy' when activity_segment = 'At Risk' and revenue_segment in ('Enterprise', 'Growth') then 'High Priority' when activity_segment = 'Churned' then 'Lost' else 'Monitor' end as customer_health_status, current_timestamp() as last_updated_at from {{ ref('int_customer_segments') }} cs left join {{ ref('stg_salesforce__accounts') }} a on cs.customer_id = a.account_id

Example 2: Monthly Revenue Fact Table

-- fct_monthly_revenue.sql select date_trunc('month', order_date) as revenue_month, billing_country as country, industry, revenue_segment, count(distinct customer_id) as active_customers, count(*) as total_orders, sum(amount_usd) as total_revenue, round(avg(amount_usd), 2) as avg_order_value, sum(case when customer_order_sequence = 1 then amount_usd else 0 end) as new_customer_revenue, sum(case when customer_order_sequence > 1 then amount_usd else 0 end) as repeat_customer_revenue from {{ ref('int_orders_enriched') }} where order_date >= '2023-01-01' group by 1, 2, 3, 4 order by 1 desc, 5 desc

Additional Best Practices

Testing Strategy

Documentation

Performance Considerations

Code Organization

📚 Official dbt Documentation

For the most up-to-date information and comprehensive details, refer to the official dbt documentation:

These resources provide the authoritative source for dbt best practices and are regularly updated by the dbt Labs team.