📋 Navigation
Quick comparison of all strategies
The default strategy
Database-native MERGE
Insert-only strategy
Partition-level replacement
Time-based processing
Which strategy to choose
🎯 Strategy Overview
delete+insert (Default)
Deletes existing records matching the unique_key, then inserts all new data. Most reliable and widely supported.
merge
Uses database-native MERGE statements. Best performance for large datasets with many updates.
append
Simply appends new rows. Perfect for event logs and time-series data. Creates duplicates if used incorrectly.
insert_overwrite
Overwrites entire partitions. Excellent for time-partitioned data with complete daily/hourly loads.
microbatch
Processes data in time-based batches. Ideal for large datasets with consistent time-based partitioning.
Complete Comparison Table
Strategy | Inserts | Updates | Deletes | Performance | Best For | Warehouse Support |
---|---|---|---|---|---|---|
delete+insert | ✅ Yes | ✅ Yes | ❌ No | Medium | General purpose, reliable | All warehouses |
merge | ✅ Yes | ✅ Yes | ⚠️ Optional | High | Large datasets, many updates | BigQuery, Snowflake, Databricks |
append | ✅ Yes | ❌ No | ❌ No | Highest | Event logs, time-series | All warehouses |
insert_overwrite | ✅ Yes | ✅ Partition-level | ✅ Partition-level | High | Time-partitioned data | BigQuery, Spark/Databricks |
microbatch | ✅ Yes | ✅ Yes | ❌ No | Optimized | Large time-series datasets | All warehouses (dbt 1.9+) |
🔄 Delete + Insert Strategy (Default)
The delete+insert strategy is dbt's default incremental approach. It's the most reliable and universally supported method.
How It Works
📋 Code Examples: Basic Implementation | Advanced Configuration
✅ Advantages
- Universal support: Works on all data warehouses
- Reliable: Simple logic, easy to debug
- Handles updates: Automatically updates existing records
- Transactional safety: Atomic operation in most warehouses
- No special features required: Uses basic SQL
❌ Limitations
- Two operations: DELETE + INSERT (can be slower)
- No deletion handling: Doesn't remove records deleted in source
- Lock contention: May cause brief table locks
- Inefficient for sparse updates: Rewrites entire rows
Advanced Configuration
⚠️ Important Considerations
- unique_key is mandatory for updates to work properly
- NULL values in unique_key will cause issues - always validate
- Composite keys should be provided as a list:
unique_key=['col1', 'col2']
- Performance impact: DELETE operations scan the entire table
🔀 Merge Strategy
The merge strategy uses database-native MERGE statements for optimal performance, especially with large datasets containing many updates.
Basic Implementation
📋 Code Examples: Basic Merge | Advanced with Deletes
Advanced Merge with Deletion Support
Warehouse-Specific Features
BigQuery
✅ Full MERGE support
✅ DELETE in MERGE
✅ Complex conditions
Snowflake
✅ Full MERGE support
✅ DELETE in MERGE
✅ Multi-table operations
Databricks
✅ Delta Lake MERGE
✅ Time travel
✅ Optimized performance
Redshift
⚠️ Limited MERGE
❌ No DELETE in MERGE
✅ Basic operations
PostgreSQL
❌ No native MERGE
⚠️ Falls back to delete+insert
✅ Works but slower
✅ Advantages
- Best performance: Single operation vs DELETE+INSERT
- Atomic operation: True ACID compliance
- Efficient updates: Only changes modified columns
- Deletion support: Can handle DELETEs in same operation
- Less locking: Reduced contention on target table
❌ Limitations
- Warehouse dependent: Not supported everywhere
- Complex syntax: Harder to debug when issues arise
- Feature variations: Different capabilities per warehouse
- Memory usage: Can require more memory for large merges
➕ Append Strategy
The append strategy simply adds new rows to the target table without any deduplication or updating. Perfect for event logs and time-series data.
Basic Implementation
📋 Code Examples: Basic Append | Event Logging | Safe Patterns
Perfect Use Cases
⚠️ Critical Warning: Duplicate Risk
The append strategy will create duplicates if you have:
- Records with the same business key but different timestamps
- Late-arriving data that overlaps with previous runs
- Source system that updates existing records
- Backfill scenarios where you reprocess historical data
Safe Append Patterns
✅ Advantages
- Fastest performance: Single INSERT operation
- Simple logic: Easy to understand and debug
- No locks: Minimal contention on target table
- Perfect for events: Natural fit for immutable data
- Universal support: Works on all databases
❌ Limitations
- No updates: Cannot modify existing records
- Duplicate risk: Easy to accidentally create duplicates
- No deletion: Cannot remove records
- Limited use cases: Only suitable for append-only data
🔁 Insert Overwrite Strategy
The insert_overwrite strategy completely replaces entire partitions with new data. Perfect for time-partitioned data where you get complete datasets.
Basic Implementation
📋 Code Examples: Basic Implementation | Dynamic Partitions | Hourly Partitioning
Advanced Partition Management
Perfect Use Cases
Daily Batch Loads
Complete daily files from external systems where you always get the full day's data
Data Reprocessing
When you need to recalculate derived metrics or fix data quality issues
Late-Arriving Data
When source systems frequently send updates for past dates
Warehouse Support
BigQuery
✅ Native support
✅ Date/timestamp partitioning
✅ Automatic partition pruning
Databricks
✅ Delta Lake partitions
✅ Dynamic partition overwrite
✅ OPTIMIZE commands
Spark
✅ Partition overwrite
⚠️ Configuration dependent
✅ Hive-style partitions
Snowflake
❌ No native partition overwrite
⚠️ Falls back to delete+insert
✅ Clustering keys help
Redshift
❌ Limited partition support
⚠️ Falls back to delete+insert
✅ Sort keys help
✅ Advantages
- Complete refresh: Handles all types of changes (I/U/D)
- Data consistency: Entire partition is always consistent
- Handles late data: Perfect for late-arriving updates
- Reprocessing friendly: Easy to recompute historical data
- High performance: Leverages partition pruning
❌ Limitations
- Requires partitioning: Only works with partitioned tables
- Limited warehouse support: Not available everywhere
- Coarse granularity: Updates entire partitions
- Resource intensive: Rewrites large amounts of data
⚡ Microbatch Strategy
The microbatch strategy (dbt 1.9+) processes data in time-based batches automatically, optimizing performance for large time-series datasets.
Basic Implementation
📋 Code Examples: Basic Microbatch | Advanced Configuration | Custom Logic
Advanced Microbatch Configuration
Microbatch Benefits
Automatic Parallelization
dbt automatically runs batches in parallel when possible, dramatically improving performance
Memory Optimization
Processes smaller chunks of data, reducing memory pressure and avoiding OOM errors
Late Data Handling
Built-in lookback windows automatically reprocess recent batches when late data arrives
Failure Recovery
If a batch fails, only that batch needs to be rerun, not the entire model
✅ Advantages
- Optimized performance: Automatic parallelization and memory management
- Built-in late data handling: Configurable lookback windows
- Failure recovery: Granular retry capabilities
- Memory efficient: Processes data in smaller chunks
- Universal support: Works on all warehouses
❌ Limitations
- dbt 1.9+ only: Requires latest dbt version
- Time-based data only: Requires event_time column
- Complex debugging: Multiple batches make troubleshooting harder
- No cross-batch joins: Limited for complex transformations
🧭 Decision Guide: Which Strategy to Choose
🎯 Strategy Selection Decision Tree
- Immutable events/logs? → Use
append
- Time-partitioned data with complete loads? → Use
insert_overwrite
- Large time-series with updates? → Use
microbatch
- Traditional dimensional/fact tables? → Continue to Step 2
- No updates needed (insert-only)? → Use
append
- Frequent updates to existing records? → Continue to Step 3
- Occasional updates? → Use
delete+insert
(safe default)
- Small-medium tables (<10M rows)? → Use
delete+insert
- Large tables (>10M rows) with many updates? → Use
merge
- Warehouse doesn't support MERGE? → Use
delete+insert
- Need to handle source deletions? → Use
merge
with delete logic - Soft deletes acceptable? → Use any strategy + soft delete pattern
- Partition-level deletion? → Use
insert_overwrite
Quick Reference by Use Case
Use Case | Recommended Strategy | Alternative | Key Configuration |
---|---|---|---|
Event logging / Clickstream | append |
microbatch |
Partition by date, no unique_key |
Customer dimension | merge |
delete+insert |
unique_key='customer_id' |
Daily sales aggregates | insert_overwrite |
merge |
Partition by sale_date |
IoT sensor data | microbatch |
append |
event_time='timestamp', batch_size='hour' |
Financial transactions | append |
microbatch |
Immutable with unique transaction_id |
User profile updates | merge |
delete+insert |
unique_key='user_id', exclude created_at |
Inventory snapshots | insert_overwrite |
delete+insert |
Partition by snapshot_date |
Application logs | append |
microbatch |
Partition by log_date |
Performance Optimization Tips
📋 Code Example: Performance Optimization
🚨 Common Mistakes to Avoid
- Using append for dimensional data - leads to duplicates
- Forgetting unique_key - prevents updates from working
- NULL values in unique_key - causes unpredictable behavior
- Wrong partition granularity - too fine causes many small files
- No late data handling - causes data loss
- Ignoring warehouse capabilities - using unsupported strategies
- Not testing with real data patterns - surprises in production
🎯 Final Recommendations
For beginners: Start with delete+insert
- it's reliable and works everywhere
For performance: Graduate to merge
when you have large tables with frequent updates
For event data: Use append
but be very careful about uniqueness
For batch loads: insert_overwrite
is perfect for complete daily/hourly datasets
For modern stacks: microbatch
provides the best balance of performance and reliability
🎓 Summary & Next Steps
Understanding incremental strategies is crucial for building efficient, scalable dbt models. Each strategy has its place:
- delete+insert: Your reliable default choice
- merge: The performance champion for updates
- append: The speed demon for immutable data
- insert_overwrite: The complete refresh for partitioned data
- microbatch: The modern solution for large time-series
Practice Path:
- Start with
delete+insert
to learn the concepts - Experiment with
append
for event data - Try
merge
when you need better performance - Explore
insert_overwrite
for time-partitioned data - Test
microbatch
for large-scale processing
Remember: Choose based on your data patterns, not just performance. The wrong strategy can cause data corruption, while the right strategy makes your pipelines both fast and reliable.
📚 Official dbt Documentation
For the most up-to-date information and comprehensive details, refer to the official dbt documentation:
- Incremental models - Official guide to dbt incremental materialization
- Materializations best practices - Guide to choosing and implementing dbt materializations
These resources provide the authoritative source for dbt incremental strategies and are regularly updated by the dbt Labs team.