The dbt Semantic Layer represents a paradigm shift in analytics engineering, providing a unified interface for defining, managing, and consuming metrics across your organization. Built on MetricFlow, it enables data teams to create a single source of truth for business metrics while maintaining the flexibility and power of dbt's transformation ecosystem.
The dbt Semantic Layer is a centralized framework that:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Data Sources │ │ dbt Transform │ │ Semantic Layer │ │ │ │ │ │ │ │ • Raw Tables │───▶│ • Models │───▶│ • Semantic │ │ • External APIs │ │ • Tests │ │ Models │ │ • Files │ │ • Documentation │ │ • Metrics │ └─────────────────┘ └─────────────────┘ │ • Dimensions │ └─────────────────┘ │ ┌─────────────────┐ │ Consumption │ │ │ │ • BI Tools │ │ • APIs │ │ • Applications │ └─────────────────┘
MetricFlow is the query engine powering the dbt Semantic Layer. It provides:
Semantic models define the business entities and their relationships. They serve as the foundation for metric calculation and include:
Metrics are business KPIs defined on top of semantic models. Types include:
# Install MetricFlow CLI pip install metricflow # Or for dbt Cloud users, MetricFlow is pre-installed # Verify installation mf --help
# dbt_project.yml name: 'my_analytics_project' version: '1.0.0' model-paths: ["models"] analysis-paths: ["analysis"] test-paths: ["tests"] seed-paths: ["data"] macro-paths: ["macros"] snapshot-paths: ["snapshots"] # Semantic Layer configuration semantic-models: - name: orders - name: customers - name: products target-path: "target" clean-targets: - "target" - "dbt_packages" # Warehouse-specific configurations vars: # Time spine configuration for MetricFlow start_date: '2020-01-01' end_date: '2025-12-31'
The time spine is essential for MetricFlow's temporal calculations:
-- models/time_spine.sql {{ dbt_utils.date_spine( datepart="day", start_date="cast('2020-01-01' as date)", end_date="cast('2025-12-31' as date)" ) }}
A semantic model YAML file defines the business entities and their relationships:
# models/semantic_models/orders.yml semantic_models: - name: orders description: "Order transactions with customer and product details" model: ref('fact_orders') # Primary entity - the grain of this semantic model defaults: agg_time_dimension: order_date entities: - name: order_id type: primary expr: order_id - name: customer_id type: foreign expr: customer_id - name: product_id type: foreign expr: product_id dimensions: - name: order_date type: time type_params: time_granularity: day expr: order_date - name: order_status type: categorical expr: order_status - name: sales_channel type: categorical expr: sales_channel - name: region type: categorical expr: customer_region measures: - name: order_total description: "Total order amount including tax" agg: sum expr: order_total_amount - name: order_count description: "Count of orders" agg: count expr: order_id - name: unique_customers description: "Count of unique customers" agg: count_distinct expr: customer_id
# Advanced semantic model with calculated measures semantic_models: - name: revenue_analytics model: ref('fact_revenue') measures: - name: gross_revenue agg: sum expr: gross_amount - name: net_revenue agg: sum expr: gross_amount - discount_amount - tax_amount - name: average_order_value description: "Average order value with null handling" agg: average expr: | case when order_total_amount > 0 then order_total_amount else null end - name: revenue_per_customer agg: sum expr: gross_amount create_metric: true
# Complex entity relationships semantic_models: - name: customer_orders model: ref('customer_order_facts') entities: - name: customer_id type: primary - name: order_id type: foreign - name: product_id type: foreign - name: store_id type: foreign expr: | case when sales_channel = 'online' then 'ONLINE_STORE' else physical_store_id end
Simple metrics perform direct aggregations on measures:
# models/metrics/revenue_metrics.yml metrics: - name: total_revenue description: "Total revenue across all orders" type: simple label: "Total Revenue" type_params: measure: order_total - name: order_count description: "Total number of orders placed" type: simple label: "Order Count" type_params: measure: order_count - name: unique_customers description: "Number of unique customers who placed orders" type: simple label: "Unique Customers" type_params: measure: unique_customers
Ratio metrics calculate relationships between two measures:
metrics: - name: average_order_value description: "Average revenue per order" type: ratio label: "Average Order Value" type_params: numerator: total_revenue denominator: order_count - name: customer_conversion_rate description: "Percentage of visitors who become customers" type: ratio label: "Customer Conversion Rate (%)" type_params: numerator: unique_customers denominator: total_visitors
Cumulative metrics aggregate values over time windows:
metrics: - name: cumulative_revenue description: "Running total of revenue since business inception" type: cumulative label: "Cumulative Revenue" type_params: measure: order_total - name: revenue_l30d description: "Rolling 30-day revenue" type: cumulative label: "Revenue (Last 30 Days)" type_params: measure: order_total window: 30 days - name: monthly_active_customers description: "Customers who made purchases in the last 30 days" type: cumulative label: "Monthly Active Customers" type_params: measure: unique_customers window: 30 days
Derived metrics combine existing metrics with mathematical operations:
metrics: - name: revenue_growth_rate description: "Month-over-month revenue growth rate" type: derived label: "Revenue Growth Rate (%)" type_params: expr: | (current_month_revenue - previous_month_revenue) / previous_month_revenue * 100 metrics: - name: current_month_revenue alias: current_revenue - name: previous_month_revenue alias: previous_revenue - name: customer_lifetime_value description: "Estimated customer lifetime value" type: derived label: "Customer Lifetime Value" type_params: expr: average_order_value * average_orders_per_customer * customer_lifespan_months metrics: - name: average_order_value - name: average_orders_per_customer - name: customer_lifespan_months
MetricFlow provides sophisticated time-based calculations:
# Complex time-based metrics metrics: - name: quarterly_revenue_trend description: "Revenue trend analysis by quarter" type: simple label: "Quarterly Revenue" type_params: measure: order_total filter: | {{ TimeDimension('order_date', 'quarter') }} >= '2023-Q1' - name: weekend_vs_weekday_sales description: "Sales comparison between weekends and weekdays" type: derived label: "Weekend vs Weekday Sales Ratio" type_params: expr: weekend_sales / weekday_sales metrics: - name: weekend_sales filters: - "{{ Dimension('order_date').day_of_week }} in (1, 7)" - name: weekday_sales filters: - "{{ Dimension('order_date').day_of_week }} in (2, 3, 4, 5, 6)"
Define metrics that span multiple semantic models:
# Metrics combining multiple semantic models metrics: - name: customer_acquisition_cost description: "Cost to acquire a new customer" type: ratio label: "Customer Acquisition Cost" type_params: numerator: semantic_model: marketing_spend measure: total_marketing_cost denominator: semantic_model: customers measure: new_customers filters: - "{{ Dimension('customer_type') }} = 'NEW'" - name: product_profitability description: "Profit margin by product category" type: derived label: "Product Profitability (%)" type_params: expr: (product_revenue - product_costs) / product_revenue * 100 metrics: - name: product_revenue semantic_model: sales - name: product_costs semantic_model: inventory_costs
metrics: - name: high_value_customer_revenue description: "Revenue from customers with LTV > $1000" type: simple label: "High-Value Customer Revenue" type_params: measure: order_total filter: | {{ Entity('customer_id') }} in ( select customer_id from {{ ref('customer_ltv') }} where lifetime_value > 1000 ) - name: seasonal_adjusted_sales description: "Sales adjusted for seasonal patterns" type: derived label: "Seasonally Adjusted Sales" type_params: expr: | case when extract(month from order_date) in (11, 12) then raw_sales / 1.25 -- Adjust for holiday boost when extract(month from order_date) in (1, 2) then raw_sales / 0.85 -- Adjust for post-holiday dip else raw_sales end metrics: - name: raw_sales alias: raw_sales
Connect Tableau to the dbt Semantic Layer:
# MetricFlow Tableau connector configuration # 1. Install Tableau connector pip install metricflow[tableau] # 2. Generate Tableau data source mf create-tableau-datasource \ --semantic-model orders \ --metrics total_revenue,order_count,average_order_value \ --dimensions order_date,customer_region,sales_channel \ --output-path tableau_datasource.tds # 3. Configure connection in Tableau # Use the generated .tds file to connect Tableau to MetricFlow
# Looker integration via GraphQL API # 1. Configure MetricFlow server mf serve --host 0.0.0.0 --port 8080 # 2. Create Looker connection connection: metricflow { url: "http://localhost:8080/graphql" database_name: "semantic_layer" } # 3. Define Looker views based on semantic models view: orders_metrics { sql_table_name: metricflow_query ;; dimension: order_date { type: date sql: ${TABLE}.order_date ;; } measure: total_revenue { type: sum sql: ${TABLE}.total_revenue ;; value_format_name: usd } measure: order_count { type: sum sql: ${TABLE}.order_count ;; } }
# Power BI integration using REST API # 1. Start MetricFlow API server mf serve --host 0.0.0.0 --port 8080 # 2. Power BI REST API connection let BaseUrl = "http://localhost:8080/api/graphql", Query = "{ metrics(names: [""total_revenue"", ""order_count""]) { name values( dimensions: [""order_date"", ""customer_region""] timeRange: {start: ""2023-01-01"", end: ""2023-12-31""} ) { order_date customer_region total_revenue order_count } } }", Response = Web.Contents(BaseUrl, [ Headers = [#"Content-Type" = "application/json"], Content = Text.ToBinary(Json.FromValue([query = Query])) ]), JsonData = Json.Document(Response) in JsonData
Build custom integrations using MetricFlow's GraphQL API:
# Python example for custom integration import requests import json class SemanticLayerClient: def __init__(self, base_url="http://localhost:8080"): self.base_url = base_url self.graphql_endpoint = f"{base_url}/api/graphql" def query_metrics(self, metrics, dimensions=None, filters=None, time_range=None, granularity="day"): """Query metrics from the semantic layer""" query = f""" query {{ metrics(names: {json.dumps(metrics)}) {{ name values( dimensions: {json.dumps(dimensions or [])} timeRange: {{ start: "{time_range['start'] if time_range else '2023-01-01'}" end: "{time_range['end'] if time_range else '2023-12-31'}" }} granularity: "{granularity}" {f'filters: {json.dumps(filters)}' if filters else ''} ) {{ {''.join(dimensions or [])} {''.join([f'{metric} ' for metric in metrics])} __timestamp }} }} }} """ response = requests.post( self.graphql_endpoint, json={"query": query}, headers={"Content-Type": "application/json"} ) return response.json() # Usage example client = SemanticLayerClient() result = client.query_metrics( metrics=["total_revenue", "order_count"], dimensions=["customer_region", "sales_channel"], time_range={"start": "2023-01-01", "end": "2023-12-31"}, filters=["customer_region != 'Unknown'"] )
# Comprehensive metric documentation metrics: - name: monthly_recurring_revenue description: | Monthly Recurring Revenue (MRR) represents predictable revenue that a company can expect to receive every month. Calculation Logic: - Includes only active subscription revenue - Excludes one-time fees and setup costs - Normalizes annual contracts to monthly values - Accounts for mid-month upgrades/downgrades Business Context: - Primary SaaS growth metric - Used for financial planning and investor reporting - Tracked monthly for board presentations Data Quality Notes: - Refreshed daily at 6 AM UTC - 24-hour delay for payment processing - Manual adjustments logged in audit table type: simple label: "Monthly Recurring Revenue" type_params: measure: mrr_amount # Governance metadata meta: owner: "finance-team@company.com" domain: "subscription_analytics" tier: "tier_1" # Critical business metric review_frequency: "monthly" last_reviewed: "2024-01-15" approved_by: "cfo@company.com" # Quality constraints constraints: - type: not_null - type: positive - type: reasonable_range params: min: 1000000 # $1M minimum max: 50000000 # $50M maximum
# Role-based access control for semantic models semantic_models: - name: sensitive_customer_data model: ref('customer_pii') # Access control configuration access_control: enabled: true default_access: deny # Role-based permissions roles: - name: finance_team permissions: [read, query] filters: - "customer_tier in ('enterprise', 'premium')" - name: marketing_team permissions: [read] filters: - "opt_in_marketing = true" - "customer_region != 'EU'" # GDPR compliance - name: executives permissions: [read, query, export] # No filters - full access # Data masking for sensitive fields dimensions: - name: customer_email type: categorical expr: | case when current_user_role() in ('finance_team', 'executives') then customer_email else concat(left(customer_email, 3), '***@***') end
# Metric versioning and deprecation strategy metrics: - name: legacy_revenue_v1 description: "DEPRECATED: Use total_revenue instead" type: simple label: "Revenue (Legacy)" type_params: measure: order_total # Deprecation metadata meta: status: "deprecated" deprecated_date: "2024-01-01" replacement_metric: "total_revenue" sunset_date: "2024-06-01" deprecation_reason: "Inconsistent tax handling in calculation" migration_guide: "https://docs.company.com/metrics/migration/revenue-v2" # Warning for users config: warn_on_use: true error_after: "2024-06-01" - name: total_revenue description: "Current version of revenue metric with consistent tax handling" type: simple label: "Total Revenue" type_params: measure: order_total_with_tax meta: version: "2.0" created_date: "2023-12-01" change_log: | v2.0 (2023-12-01): Fixed tax calculation inconsistencies v1.1 (2023-06-01): Added currency conversion v1.0 (2023-01-01): Initial implementation
# Create materialized tables for expensive metrics # models/marts/metrics/daily_revenue_summary.sql {{ config( materialized='incremental', unique_key=['date_day', 'customer_region'], on_schema_change='fail', partition_by=['date_day'], cluster_by=['customer_region'] ) }} with daily_metrics as ( select date_trunc('day', order_date) as date_day, customer_region, sum(order_total_amount) as total_revenue, count(distinct order_id) as order_count, count(distinct customer_id) as unique_customers, avg(order_total_amount) as average_order_value from {{ ref('fact_orders') }} {% if is_incremental() %} where order_date >= ( select max(date_day) from {{ this }} ) {% endif %} group by 1, 2 ) select * from daily_metrics
# MetricFlow caching configuration # metricflow_config.yml cache: enabled: true # Redis configuration for distributed caching redis: host: "redis.company.com" port: 6379 db: 0 password: "{{ env_var('REDIS_PASSWORD') }}" # Cache TTL by metric type ttl_settings: simple_metrics: 3600 # 1 hour ratio_metrics: 1800 # 30 minutes cumulative_metrics: 7200 # 2 hours derived_metrics: 1800 # 30 minutes # Cache warming for critical metrics warm_cache: enabled: true schedule: "0 6 * * *" # Daily at 6 AM metrics: - total_revenue - monthly_recurring_revenue - customer_acquisition_cost # Pre-compute common dimension combinations dimension_combinations: - [customer_region] - [customer_region, sales_channel] - [product_category]
# Performance monitoring for semantic layer # models/monitoring/semantic_layer_performance.sql with query_performance as ( select metric_name, query_date, execution_time_seconds, rows_returned, cache_hit, query_complexity_score, -- Performance categorization case when execution_time_seconds < 5 then 'fast' when execution_time_seconds < 30 then 'medium' else 'slow' end as performance_tier from {{ ref('metricflow_query_log') }} where query_date >= current_date - 7 ), performance_alerts as ( select metric_name, avg(execution_time_seconds) as avg_execution_time, max(execution_time_seconds) as max_execution_time, count(*) as total_queries, sum(case when cache_hit then 1 else 0 end) / count(*) as cache_hit_rate, -- Alert conditions case when avg(execution_time_seconds) > 60 then 'CRITICAL' when avg(execution_time_seconds) > 30 then 'WARNING' else 'OK' end as performance_status from query_performance group by metric_name ) select * from performance_alerts where performance_status != 'OK'
# Debug metric calculation with explicit null handling measures: - name: revenue_with_null_check description: "Revenue with explicit null value debugging" agg: sum expr: | case when order_total_amount is null then 0 when order_total_amount < 0 then 0 -- Handle negative values else order_total_amount end # Add validation constraints constraints: - type: not_null error_message: "Revenue metric should never be null" - type: positive error_message: "Revenue metric should never be negative"
# Debugging time dimension problems semantic_models: - name: orders_debug model: ref('fact_orders') dimensions: - name: order_date type: time type_params: time_granularity: day expr: | -- Explicit date conversion with error handling case when order_date is null then '1900-01-01'::date when order_date < '2020-01-01' then '2020-01-01'::date when order_date > current_date then current_date else order_date::date end # Debug dimension to identify data quality issues - name: date_quality_flag type: categorical expr: | case when order_date is null then 'NULL_DATE' when order_date < '2020-01-01' then 'TOO_OLD' when order_date > current_date then 'FUTURE_DATE' else 'VALID' end
# MetricFlow query debugging and optimization # Enable query logging mf query --metrics total_revenue \ --dimensions customer_region \ --start-time 2024-01-01 \ --end-time 2024-01-31 \ --explain \ --log-level DEBUG # Analyze query execution plan mf explain --metrics total_revenue \ --dimensions customer_region,product_category \ --output-format json > query_plan.json # Profile query performance mf profile --metrics monthly_recurring_revenue \ --dimensions customer_segment \ --start-time 2024-01-01 \ --end-time 2024-12-31 \ --output-path performance_profile.json
# Comprehensive data quality tests for semantic models # tests/semantic_layer/test_metric_quality.sql -- Test 1: Metric consistency across different aggregation levels with daily_revenue as ( select sum(order_total_amount) as daily_total from {{ ref('fact_orders') }} where order_date = '2024-01-15' ), metric_revenue as ( -- This would be replaced with actual MetricFlow query result select 125000 as metric_total -- Example value ) select daily_total, metric_total, abs(daily_total - metric_total) as difference, case when abs(daily_total - metric_total) > 100 then 'FAIL' else 'PASS' end as test_result from daily_revenue cross join metric_revenue -- Test 2: Metric trend validation -- tests/semantic_layer/test_metric_trends.sql with revenue_trend as ( select date_trunc('month', order_date) as month, sum(order_total_amount) as monthly_revenue from {{ ref('fact_orders') }} where order_date >= '2024-01-01' group by 1 ), trend_analysis as ( select month, monthly_revenue, lag(monthly_revenue) over (order by month) as prev_month_revenue, (monthly_revenue - lag(monthly_revenue) over (order by month)) / nullif(lag(monthly_revenue) over (order by month), 0) as growth_rate from revenue_trend ) select * from trend_analysis where abs(growth_rate) > 0.5 -- Flag months with >50% change
The following code examples demonstrate real-world implementation patterns for the dbt Semantic Layer. Each example includes complete semantic model definitions, metric configurations, and integration code.
To implement the dbt Semantic Layer in your project:
For detailed implementation guidance, see the Code Examples README which provides step-by-step setup instructions and best practices.
This guide demonstrates cutting-edge analytics engineering practices with the dbt Semantic Layer. For the latest features and updates, always refer to the official dbt documentation.