ETL vs ELT: Choosing the Right Data Integration Pattern

ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are two approaches to moving data from source systems into a data warehouse. The difference is not just the order of operations — it reflects fundamentally different assumptions about where compute happens, who owns the transformation logic, and how the pipeline evolves over time.

This article compares both patterns with practical code examples and explains why Klivvr's Data Platform uses an ELT-first approach for the majority of its data pipelines.

ETL: Transform Before Loading

In ETL, data is extracted from source systems, transformed in a processing layer (often a dedicated ETL server or cluster), and then loaded into the target warehouse. The warehouse receives clean, structured data.

# ETL pattern: transform in Python before loading
import pandas as pd
from sqlalchemy import create_engine
 
def extract_payments(api_client) -> pd.DataFrame:
    """Extract raw payments from Stripe API."""
    payments = api_client.get_payments(
        created_after="2025-01-01",
        limit=10000
    )
    return pd.DataFrame(payments)
 
def transform_payments(df: pd.DataFrame) -> pd.DataFrame:
    """Apply business logic before loading."""
    # Type conversions
    df["amount"] = df["amount"].astype(float) / 100
    df["created_at"] = pd.to_datetime(df["created"])
 
    # Business rules
    df["status_clean"] = df["status"].str.lower()
    df["is_successful"] = df["status_clean"].isin(["succeeded", "paid"])
 
    # Currency normalization
    df = df.merge(
        get_exchange_rates(),
        left_on=["currency", df["created_at"].dt.date],
        right_on=["currency_code", "rate_date"],
        how="left"
    )
    df["amount_usd"] = df.apply(
        lambda row: row["amount"] if row["currency"] == "USD"
        else row["amount"] * row.get("rate_to_usd", 1),
        axis=1
    )
 
    # Select final columns
    return df[[
        "id", "customer_id", "amount", "amount_usd",
        "currency", "status_clean", "is_successful", "created_at"
    ]].rename(columns={"id": "payment_id", "status_clean": "status"})
 
def load_payments(df: pd.DataFrame, engine):
    """Load transformed data into warehouse."""
    df.to_sql(
        "payments",
        engine,
        schema="analytics",
        if_exists="append",
        index=False,
        method="multi",
        chunksize=5000
    )
 
# Pipeline execution
raw = extract_payments(stripe_client)
transformed = transform_payments(raw)
load_payments(transformed, warehouse_engine)

ETL works well when the transformation requires capabilities that SQL cannot provide — complex machine learning feature engineering, API enrichment calls, or binary file processing. The processing happens outside the warehouse, so the warehouse is not burdened with heavy compute.

ELT: Transform After Loading

In ELT, raw data is extracted and loaded directly into the warehouse with minimal transformation. The warehouse's own compute engine handles all transformations using SQL.

# ELT pattern: load raw, transform in warehouse
import json
from google.cloud import bigquery
 
def extract_and_load_payments(api_client, bq_client):
    """Extract raw payments and load directly to warehouse."""
    payments = api_client.get_payments(
        created_after="2025-01-01",
        limit=10000
    )
 
    # Load raw JSON into warehouse
    rows = [
        {
            "raw_data": json.dumps(payment),
            "source": "stripe",
            "ingested_at": datetime.utcnow().isoformat()
        }
        for payment in payments
    ]
 
    table_ref = bq_client.dataset("bronze").table("stripe_payments")
    errors = bq_client.insert_rows_json(table_ref, rows)
 
    if errors:
        raise Exception(f"Load failed: {errors}")
 
    return len(rows)

The transformation then happens inside the warehouse using SQL, typically orchestrated by dbt:

-- dbt model: transform raw payments in the warehouse
-- models/silver/payments.sql
{{ config(materialized='incremental', unique_key='payment_id') }}
 
WITH raw AS (
    SELECT
        JSON_EXTRACT_SCALAR(raw_data, '$.id') AS payment_id,
        JSON_EXTRACT_SCALAR(raw_data, '$.customer') AS customer_id,
        CAST(JSON_EXTRACT_SCALAR(raw_data, '$.amount') AS DECIMAL) / 100 AS amount,
        JSON_EXTRACT_SCALAR(raw_data, '$.currency') AS currency,
        LOWER(JSON_EXTRACT_SCALAR(raw_data, '$.status')) AS status,
        TIMESTAMP(JSON_EXTRACT_SCALAR(raw_data, '$.created')) AS created_at,
        ingested_at
    FROM {{ source('bronze', 'stripe_payments') }}
    {% if is_incremental() %}
    WHERE ingested_at > (SELECT MAX(ingested_at) FROM {{ this }})
    {% endif %}
),
 
enriched AS (
    SELECT
        r.*,
        CASE
            WHEN r.currency = 'USD' THEN r.amount
            ELSE r.amount * COALESCE(fx.rate_to_usd, 1)
        END AS amount_usd,
        r.status IN ('succeeded', 'paid') AS is_successful
    FROM raw r
    LEFT JOIN {{ ref('stg_exchange_rates') }} fx
        ON r.currency = fx.currency_code
        AND DATE(r.created_at) = fx.rate_date
)
 
SELECT * FROM enriched

When to Use Which Pattern

The choice between ETL and ELT depends on several factors.

Use ELT when the warehouse has sufficient compute capacity, transformations are expressible in SQL, multiple teams need to iterate on transformation logic independently, and you want full data lineage through a tool like dbt. Modern cloud warehouses like BigQuery, Snowflake, and Redshift are designed for this pattern.

Use ETL when transformations require non-SQL logic such as ML inference, NLP, or image processing. Also when source data volumes are massive and you need to filter or aggregate before loading to reduce storage costs. ETL is also appropriate when the target system has limited compute capacity or when real-time enrichment from external APIs is required during the transform step.

Use a hybrid approach when some data sources need ETL preprocessing while others can go straight to ELT. This is the reality for most organizations, including Klivvr.

Klivvr's Hybrid Approach

Klivvr's Data Platform uses ELT as the default pattern for structured data from operational databases and SaaS APIs. Raw data is loaded into the bronze layer with minimal transformation, and dbt handles all downstream modeling.

For specific use cases, we use ETL pipelines written in Python:

# Hybrid: Python ETL for complex preprocessing
# then dbt ELT for downstream modeling
 
class PaymentFraudEnricher:
    """ETL pipeline that enriches payments with ML fraud scores
    before loading to the warehouse."""
 
    def __init__(self, model_client, warehouse):
        self.model = model_client
        self.warehouse = warehouse
 
    def run(self, payments: list[dict]) -> int:
        # Extract: already have raw payments
 
        # Transform: ML enrichment that cannot run in SQL
        enriched = []
        for payment in payments:
            fraud_score = self.model.predict_fraud_probability(
                amount=payment["amount"],
                currency=payment["currency"],
                customer_history=payment.get("customer_history", {}),
                device_fingerprint=payment.get("device_fingerprint")
            )
            payment["fraud_score"] = fraud_score
            payment["fraud_flag"] = fraud_score > 0.85
            enriched.append(payment)
 
        # Load: enriched data goes to silver layer directly
        self.warehouse.bulk_insert("silver.payments_fraud_enriched", enriched)
 
        return len(enriched)

The fraud-enriched data then becomes an input to downstream dbt models that join it with other silver tables to produce gold-layer analytics.

Performance Considerations

ELT pushes compute to the warehouse, which means warehouse costs scale with transformation complexity. Understanding cost implications is important:

-- Expensive: full table scan with complex joins
-- This runs in the warehouse and consumes compute credits
SELECT
    p.payment_id,
    p.amount,
    c.segment,
    COUNT(*) OVER (
        PARTITION BY p.customer_id
        ORDER BY p.created_at
        ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
    ) AS rolling_30_day_txn_count
FROM silver.payments p
JOIN silver.customers c ON p.customer_id = c.customer_id
WHERE p.created_at >= '2025-01-01';
 
-- Optimized: incremental processing reduces compute
-- Only process new data since last run
{{ config(materialized='incremental') }}
 
SELECT ...
FROM silver.payments p
{% if is_incremental() %}
WHERE p.created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}

ETL compute costs are more predictable since they run on dedicated infrastructure. But ETL pipelines require more engineering effort to build, maintain, and scale.

Conclusion

ETL and ELT are not competing paradigms — they are complementary tools suited to different situations. ELT leverages the power of modern cloud warehouses to keep transformation logic in SQL, where it is versionable, testable, and accessible to analysts. ETL handles the cases where transformation requires capabilities beyond SQL. Klivvr's Data Platform defaults to ELT for simplicity and flexibility, falling back to Python ETL only when the use case demands it. The key is not choosing one pattern dogmatically but understanding the tradeoffs and applying each where it fits best.

ETL vs ELT: Choosing the Right Data Integration Pattern

ETL: Transform Before Loading

ELT: Transform After Loading

When to Use Which Pattern

Klivvr's Hybrid Approach

Performance Considerations

Conclusion

Related Articles

Data-Driven Decision Making in Fintech

Incremental Models in dbt: Processing Data Efficiently

Data Quality Testing: Strategies and Implementation

Related Articles

Data-Driven Decision Making in Fintech
November 20, 20256 min read

Incremental Models in dbt: Processing Data Efficiently
October 30, 20258 min read

Data Quality Testing: Strategies and Implementation
October 28, 20258 min read