dbt Modeling Guide: Building Reliable Data Transformations

A practical guide to structuring dbt models for maintainability and reliability, covering staging layers, intermediate models, mart design, and testing strategies used in production data platforms.

technical6 min readBy Klivvr Engineering
Share:

dbt (data build tool) has become the standard for managing SQL-based data transformations. It brings software engineering practices, version control, testing, documentation, and modularity, to the world of analytics engineering. But having dbt installed is not the same as using it well. Poorly structured dbt projects become unmaintainable within months: models reference each other in circular patterns, business logic is duplicated across dozens of files, and a single upstream change breaks half the pipeline. This article presents the modeling patterns we use in Klivvr's Data Platform, covering the layered architecture, naming conventions, materialization strategies, and testing approaches that keep our dbt project reliable as it scales.

The Three-Layer Architecture

Every dbt project benefits from a clear separation between raw data ingestion, business logic application, and consumption-ready output. We enforce this separation through a three-layer architecture: staging, intermediate, and marts.

Staging models are the first transformation layer. They perform light cleaning on raw source data: renaming columns, casting types, and filtering deleted records. Staging models have a one-to-one relationship with source tables and contain no business logic:

-- models/staging/stripe/stg_stripe__payments.sql
with source as (
    select * from {{ source('stripe', 'payments') }}
),
 
renamed as (
    select
        id as payment_id,
        customer_id,
        cast(amount as decimal(18, 2)) / 100 as amount,
        currency,
        lower(status) as status,
        cast(created as timestamp) as created_at,
        cast(metadata_order_id as varchar) as order_id,
        _sdc_extracted_at as loaded_at
    from source
    where _sdc_deleted_at is null
)
 
select * from renamed

Staging models always follow the naming convention stg_{source}__{entity}. The double underscore separates the source system from the entity name. This convention makes it immediately clear where the data originates, which is invaluable when debugging data quality issues.

Intermediate models contain the business logic that transforms staged data into meaningful entities. They handle joins across sources, business rule application, and deduplication:

-- models/intermediate/payments/int_payments__enriched.sql
with payments as (
    select * from {{ ref('stg_stripe__payments') }}
),
 
customers as (
    select * from {{ ref('stg_app__customers') }}
),
 
exchange_rates as (
    select * from {{ ref('stg_forex__exchange_rates') }}
),
 
enriched as (
    select
        p.payment_id,
        p.order_id,
        p.amount,
        p.currency,
        p.status,
        p.created_at,
        c.customer_id,
        c.customer_name,
        c.customer_segment,
        c.country_code,
        -- Normalize to USD for reporting
        case
            when p.currency = 'USD' then p.amount
            else p.amount * coalesce(er.rate_to_usd, 1)
        end as amount_usd,
        -- Business rule: payment is considered successful
        p.status in ('succeeded', 'paid') as is_successful
    from payments p
    left join customers c on p.customer_id = c.customer_id
    left join exchange_rates er
        on p.currency = er.currency_code
        and p.created_at::date = er.rate_date
)
 
select * from enriched

Mart models are the consumption layer, optimized for specific use cases like dashboards, reports, or downstream applications. They aggregate, pivot, and reshape data into the exact form that consumers need:

-- models/marts/finance/fct_daily_revenue.sql
{{ config(
    materialized='incremental',
    unique_key='revenue_date',
    on_schema_change='append_new_columns'
) }}
 
with payments as (
    select * from {{ ref('int_payments__enriched') }}
    where is_successful
    {% if is_incremental() %}
        and created_at >= (select max(revenue_date) from {{ this }})
    {% endif %}
),
 
daily_revenue as (
    select
        created_at::date as revenue_date,
        count(distinct payment_id) as total_transactions,
        count(distinct customer_id) as unique_customers,
        sum(amount_usd) as gross_revenue_usd,
        sum(case when customer_segment = 'enterprise' then amount_usd else 0 end) as enterprise_revenue_usd,
        sum(case when customer_segment = 'consumer' then amount_usd else 0 end) as consumer_revenue_usd,
        avg(amount_usd) as avg_transaction_usd
    from payments
    group by 1
)
 
select * from daily_revenue

Naming Conventions and Project Organization

Consistent naming is the cheapest form of documentation. Our conventions encode information about each model's layer, source, and purpose directly in the filename:

# dbt_project.yml
name: klivvr_data_platform
version: '1.0.0'
 
model-paths: ["models"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
 
models:
  klivvr_data_platform:
    staging:
      +materialized: view
      +schema: staging
      stripe:
        +tags: ["stripe", "payments"]
      app:
        +tags: ["app", "core"]
    intermediate:
      +materialized: view
      +schema: intermediate
    marts:
      +materialized: table
      finance:
        +schema: finance
        +tags: ["finance"]
      product:
        +schema: product
        +tags: ["product"]

The directory structure mirrors the logical layers:

models/
  staging/
    stripe/
      _stripe__sources.yml
      _stripe__models.yml
      stg_stripe__payments.sql
      stg_stripe__customers.sql
    app/
      _app__sources.yml
      _app__models.yml
      stg_app__customers.sql
      stg_app__transactions.sql
  intermediate/
    payments/
      _int_payments__models.yml
      int_payments__enriched.sql
      int_payments__deduplicated.sql
  marts/
    finance/
      _finance__models.yml
      fct_daily_revenue.sql
      fct_monthly_cohort.sql
      dim_customers.sql

YAML files prefixed with underscore contain source definitions, model documentation, and test configurations. Keeping them adjacent to the models they describe makes maintenance straightforward.

Testing Strategies

dbt's built-in testing framework is one of its strongest features. We test at every layer, with increasing strictness as data moves closer to consumers.

Staging models get schema tests that validate data contracts with source systems:

# models/staging/stripe/_stripe__models.yml
version: 2
 
models:
  - name: stg_stripe__payments
    description: "Staged Stripe payment events with standardized column names and types"
    columns:
      - name: payment_id
        description: "Unique identifier for the payment"
        tests:
          - unique
          - not_null
      - name: amount
        description: "Payment amount in the original currency (converted from cents)"
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 1000000
      - name: status
        description: "Payment status"
        tests:
          - not_null
          - accepted_values:
              values: ['succeeded', 'failed', 'pending', 'refunded', 'disputed']
      - name: created_at
        description: "Timestamp when the payment was created"
        tests:
          - not_null
          - dbt_utils.recency:
              datepart: day
              field: created_at
              interval: 3

Mart models get custom data tests that validate business logic:

-- tests/finance/assert_daily_revenue_non_negative.sql
-- Ensures that daily revenue never goes negative, which would indicate
-- a data quality issue in the payment enrichment pipeline
 
select
    revenue_date,
    gross_revenue_usd
from {{ ref('fct_daily_revenue') }}
where gross_revenue_usd < 0
-- tests/finance/assert_customer_count_consistency.sql
-- Ensures unique_customers never exceeds total_transactions
-- (a customer can make multiple transactions, but not vice versa)
 
select
    revenue_date,
    unique_customers,
    total_transactions
from {{ ref('fct_daily_revenue') }}
where unique_customers > total_transactions

Macros for Reusable Logic

Business logic that appears in multiple models should be extracted into macros to ensure consistency and reduce maintenance burden:

-- macros/currency_conversion.sql
{% macro convert_to_usd(amount_column, currency_column, date_column) %}
    case
        when {{ currency_column }} = 'USD' then {{ amount_column }}
        when {{ currency_column }} = 'EUR' then {{ amount_column }} * (
            select rate_to_usd
            from {{ ref('stg_forex__exchange_rates') }}
            where currency_code = {{ currency_column }}
              and rate_date = {{ date_column }}::date
            limit 1
        )
        else {{ amount_column }} * coalesce(
            (
                select rate_to_usd
                from {{ ref('stg_forex__exchange_rates') }}
                where currency_code = {{ currency_column }}
                  and rate_date = {{ date_column }}::date
                limit 1
            ),
            1
        )
    end
{% endmacro %}
 
-- Usage in models:
-- select {{ convert_to_usd('amount', 'currency', 'created_at') }} as amount_usd

Conclusion

A well-structured dbt project is one where any engineer can find a model by its name, understand its purpose from its layer, trace its lineage through references, and trust its output through tests. The three-layer architecture, staging, intermediate, and marts, provides this structure by separating concerns and enforcing directional data flow.

The patterns described in this article are not theoretical. They are the result of iterating on Klivvr's Data Platform over multiple quarters, learning from mistakes that included circular references, duplicated business logic, and untested assumptions about source data. Start with the three-layer architecture and naming conventions. Add tests aggressively, especially at the boundary between staging and intermediate layers. Extract macros when duplication appears. And treat your dbt project as a software product that deserves the same rigor as any production codebase, because that is exactly what it is.

Related Articles

business

Data-Driven Decision Making in Fintech

How a well-architected data platform enables better business decisions across product, finance, and operations — with practical examples of self-service analytics and data democratization at Klivvr.

6 min read
technical

Incremental Models in dbt: Processing Data Efficiently

A deep dive into dbt's incremental materialization strategy, covering when to use incremental models, how to implement them correctly, and how to avoid the common pitfalls that lead to data inconsistencies.

8 min read
technical

Data Quality Testing: Strategies and Implementation

A comprehensive guide to implementing data quality testing across the data pipeline, from schema validation and freshness checks to statistical anomaly detection and business rule enforcement.

8 min read