dbt Modeling Guide: Building Reliable Data Transformations
A practical guide to structuring dbt models for maintainability and reliability, covering staging layers, intermediate models, mart design, and testing strategies used in production data platforms.
dbt (data build tool) has become the standard for managing SQL-based data transformations. It brings software engineering practices, version control, testing, documentation, and modularity, to the world of analytics engineering. But having dbt installed is not the same as using it well. Poorly structured dbt projects become unmaintainable within months: models reference each other in circular patterns, business logic is duplicated across dozens of files, and a single upstream change breaks half the pipeline. This article presents the modeling patterns we use in Klivvr's Data Platform, covering the layered architecture, naming conventions, materialization strategies, and testing approaches that keep our dbt project reliable as it scales.
The Three-Layer Architecture
Every dbt project benefits from a clear separation between raw data ingestion, business logic application, and consumption-ready output. We enforce this separation through a three-layer architecture: staging, intermediate, and marts.
Staging models are the first transformation layer. They perform light cleaning on raw source data: renaming columns, casting types, and filtering deleted records. Staging models have a one-to-one relationship with source tables and contain no business logic:
-- models/staging/stripe/stg_stripe__payments.sql
with source as (
select * from {{ source('stripe', 'payments') }}
),
renamed as (
select
id as payment_id,
customer_id,
cast(amount as decimal(18, 2)) / 100 as amount,
currency,
lower(status) as status,
cast(created as timestamp) as created_at,
cast(metadata_order_id as varchar) as order_id,
_sdc_extracted_at as loaded_at
from source
where _sdc_deleted_at is null
)
select * from renamedStaging models always follow the naming convention stg_{source}__{entity}. The double underscore separates the source system from the entity name. This convention makes it immediately clear where the data originates, which is invaluable when debugging data quality issues.
Intermediate models contain the business logic that transforms staged data into meaningful entities. They handle joins across sources, business rule application, and deduplication:
-- models/intermediate/payments/int_payments__enriched.sql
with payments as (
select * from {{ ref('stg_stripe__payments') }}
),
customers as (
select * from {{ ref('stg_app__customers') }}
),
exchange_rates as (
select * from {{ ref('stg_forex__exchange_rates') }}
),
enriched as (
select
p.payment_id,
p.order_id,
p.amount,
p.currency,
p.status,
p.created_at,
c.customer_id,
c.customer_name,
c.customer_segment,
c.country_code,
-- Normalize to USD for reporting
case
when p.currency = 'USD' then p.amount
else p.amount * coalesce(er.rate_to_usd, 1)
end as amount_usd,
-- Business rule: payment is considered successful
p.status in ('succeeded', 'paid') as is_successful
from payments p
left join customers c on p.customer_id = c.customer_id
left join exchange_rates er
on p.currency = er.currency_code
and p.created_at::date = er.rate_date
)
select * from enrichedMart models are the consumption layer, optimized for specific use cases like dashboards, reports, or downstream applications. They aggregate, pivot, and reshape data into the exact form that consumers need:
-- models/marts/finance/fct_daily_revenue.sql
{{ config(
materialized='incremental',
unique_key='revenue_date',
on_schema_change='append_new_columns'
) }}
with payments as (
select * from {{ ref('int_payments__enriched') }}
where is_successful
{% if is_incremental() %}
and created_at >= (select max(revenue_date) from {{ this }})
{% endif %}
),
daily_revenue as (
select
created_at::date as revenue_date,
count(distinct payment_id) as total_transactions,
count(distinct customer_id) as unique_customers,
sum(amount_usd) as gross_revenue_usd,
sum(case when customer_segment = 'enterprise' then amount_usd else 0 end) as enterprise_revenue_usd,
sum(case when customer_segment = 'consumer' then amount_usd else 0 end) as consumer_revenue_usd,
avg(amount_usd) as avg_transaction_usd
from payments
group by 1
)
select * from daily_revenueNaming Conventions and Project Organization
Consistent naming is the cheapest form of documentation. Our conventions encode information about each model's layer, source, and purpose directly in the filename:
# dbt_project.yml
name: klivvr_data_platform
version: '1.0.0'
model-paths: ["models"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
models:
klivvr_data_platform:
staging:
+materialized: view
+schema: staging
stripe:
+tags: ["stripe", "payments"]
app:
+tags: ["app", "core"]
intermediate:
+materialized: view
+schema: intermediate
marts:
+materialized: table
finance:
+schema: finance
+tags: ["finance"]
product:
+schema: product
+tags: ["product"]The directory structure mirrors the logical layers:
models/
staging/
stripe/
_stripe__sources.yml
_stripe__models.yml
stg_stripe__payments.sql
stg_stripe__customers.sql
app/
_app__sources.yml
_app__models.yml
stg_app__customers.sql
stg_app__transactions.sql
intermediate/
payments/
_int_payments__models.yml
int_payments__enriched.sql
int_payments__deduplicated.sql
marts/
finance/
_finance__models.yml
fct_daily_revenue.sql
fct_monthly_cohort.sql
dim_customers.sql
YAML files prefixed with underscore contain source definitions, model documentation, and test configurations. Keeping them adjacent to the models they describe makes maintenance straightforward.
Testing Strategies
dbt's built-in testing framework is one of its strongest features. We test at every layer, with increasing strictness as data moves closer to consumers.
Staging models get schema tests that validate data contracts with source systems:
# models/staging/stripe/_stripe__models.yml
version: 2
models:
- name: stg_stripe__payments
description: "Staged Stripe payment events with standardized column names and types"
columns:
- name: payment_id
description: "Unique identifier for the payment"
tests:
- unique
- not_null
- name: amount
description: "Payment amount in the original currency (converted from cents)"
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
max_value: 1000000
- name: status
description: "Payment status"
tests:
- not_null
- accepted_values:
values: ['succeeded', 'failed', 'pending', 'refunded', 'disputed']
- name: created_at
description: "Timestamp when the payment was created"
tests:
- not_null
- dbt_utils.recency:
datepart: day
field: created_at
interval: 3Mart models get custom data tests that validate business logic:
-- tests/finance/assert_daily_revenue_non_negative.sql
-- Ensures that daily revenue never goes negative, which would indicate
-- a data quality issue in the payment enrichment pipeline
select
revenue_date,
gross_revenue_usd
from {{ ref('fct_daily_revenue') }}
where gross_revenue_usd < 0-- tests/finance/assert_customer_count_consistency.sql
-- Ensures unique_customers never exceeds total_transactions
-- (a customer can make multiple transactions, but not vice versa)
select
revenue_date,
unique_customers,
total_transactions
from {{ ref('fct_daily_revenue') }}
where unique_customers > total_transactionsMacros for Reusable Logic
Business logic that appears in multiple models should be extracted into macros to ensure consistency and reduce maintenance burden:
-- macros/currency_conversion.sql
{% macro convert_to_usd(amount_column, currency_column, date_column) %}
case
when {{ currency_column }} = 'USD' then {{ amount_column }}
when {{ currency_column }} = 'EUR' then {{ amount_column }} * (
select rate_to_usd
from {{ ref('stg_forex__exchange_rates') }}
where currency_code = {{ currency_column }}
and rate_date = {{ date_column }}::date
limit 1
)
else {{ amount_column }} * coalesce(
(
select rate_to_usd
from {{ ref('stg_forex__exchange_rates') }}
where currency_code = {{ currency_column }}
and rate_date = {{ date_column }}::date
limit 1
),
1
)
end
{% endmacro %}
-- Usage in models:
-- select {{ convert_to_usd('amount', 'currency', 'created_at') }} as amount_usdConclusion
A well-structured dbt project is one where any engineer can find a model by its name, understand its purpose from its layer, trace its lineage through references, and trust its output through tests. The three-layer architecture, staging, intermediate, and marts, provides this structure by separating concerns and enforcing directional data flow.
The patterns described in this article are not theoretical. They are the result of iterating on Klivvr's Data Platform over multiple quarters, learning from mistakes that included circular references, duplicated business logic, and untested assumptions about source data. Start with the three-layer architecture and naming conventions. Add tests aggressively, especially at the boundary between staging and intermediate layers. Extract macros when duplication appears. And treat your dbt project as a software product that deserves the same rigor as any production codebase, because that is exactly what it is.
Related Articles
Data-Driven Decision Making in Fintech
How a well-architected data platform enables better business decisions across product, finance, and operations — with practical examples of self-service analytics and data democratization at Klivvr.
Incremental Models in dbt: Processing Data Efficiently
A deep dive into dbt's incremental materialization strategy, covering when to use incremental models, how to implement them correctly, and how to avoid the common pitfalls that lead to data inconsistencies.
Data Quality Testing: Strategies and Implementation
A comprehensive guide to implementing data quality testing across the data pipeline, from schema validation and freshness checks to statistical anomaly detection and business rule enforcement.