ETL vs ELT: Choosing the Right Data Integration Pattern
A practical comparison of ETL and ELT data integration patterns, covering when to use each approach, transformation strategies, and how Klivvr's Data Platform leverages ELT for scalable data processing.
ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are two approaches to moving data from source systems into a data warehouse. The difference is not just the order of operations — it reflects fundamentally different assumptions about where compute happens, who owns the transformation logic, and how the pipeline evolves over time.
This article compares both patterns with practical code examples and explains why Klivvr's Data Platform uses an ELT-first approach for the majority of its data pipelines.
ETL: Transform Before Loading
In ETL, data is extracted from source systems, transformed in a processing layer (often a dedicated ETL server or cluster), and then loaded into the target warehouse. The warehouse receives clean, structured data.
# ETL pattern: transform in Python before loading
import pandas as pd
from sqlalchemy import create_engine
def extract_payments(api_client) -> pd.DataFrame:
"""Extract raw payments from Stripe API."""
payments = api_client.get_payments(
created_after="2025-01-01",
limit=10000
)
return pd.DataFrame(payments)
def transform_payments(df: pd.DataFrame) -> pd.DataFrame:
"""Apply business logic before loading."""
# Type conversions
df["amount"] = df["amount"].astype(float) / 100
df["created_at"] = pd.to_datetime(df["created"])
# Business rules
df["status_clean"] = df["status"].str.lower()
df["is_successful"] = df["status_clean"].isin(["succeeded", "paid"])
# Currency normalization
df = df.merge(
get_exchange_rates(),
left_on=["currency", df["created_at"].dt.date],
right_on=["currency_code", "rate_date"],
how="left"
)
df["amount_usd"] = df.apply(
lambda row: row["amount"] if row["currency"] == "USD"
else row["amount"] * row.get("rate_to_usd", 1),
axis=1
)
# Select final columns
return df[[
"id", "customer_id", "amount", "amount_usd",
"currency", "status_clean", "is_successful", "created_at"
]].rename(columns={"id": "payment_id", "status_clean": "status"})
def load_payments(df: pd.DataFrame, engine):
"""Load transformed data into warehouse."""
df.to_sql(
"payments",
engine,
schema="analytics",
if_exists="append",
index=False,
method="multi",
chunksize=5000
)
# Pipeline execution
raw = extract_payments(stripe_client)
transformed = transform_payments(raw)
load_payments(transformed, warehouse_engine)ETL works well when the transformation requires capabilities that SQL cannot provide — complex machine learning feature engineering, API enrichment calls, or binary file processing. The processing happens outside the warehouse, so the warehouse is not burdened with heavy compute.
ELT: Transform After Loading
In ELT, raw data is extracted and loaded directly into the warehouse with minimal transformation. The warehouse's own compute engine handles all transformations using SQL.
# ELT pattern: load raw, transform in warehouse
import json
from google.cloud import bigquery
def extract_and_load_payments(api_client, bq_client):
"""Extract raw payments and load directly to warehouse."""
payments = api_client.get_payments(
created_after="2025-01-01",
limit=10000
)
# Load raw JSON into warehouse
rows = [
{
"raw_data": json.dumps(payment),
"source": "stripe",
"ingested_at": datetime.utcnow().isoformat()
}
for payment in payments
]
table_ref = bq_client.dataset("bronze").table("stripe_payments")
errors = bq_client.insert_rows_json(table_ref, rows)
if errors:
raise Exception(f"Load failed: {errors}")
return len(rows)The transformation then happens inside the warehouse using SQL, typically orchestrated by dbt:
-- dbt model: transform raw payments in the warehouse
-- models/silver/payments.sql
{{ config(materialized='incremental', unique_key='payment_id') }}
WITH raw AS (
SELECT
JSON_EXTRACT_SCALAR(raw_data, '$.id') AS payment_id,
JSON_EXTRACT_SCALAR(raw_data, '$.customer') AS customer_id,
CAST(JSON_EXTRACT_SCALAR(raw_data, '$.amount') AS DECIMAL) / 100 AS amount,
JSON_EXTRACT_SCALAR(raw_data, '$.currency') AS currency,
LOWER(JSON_EXTRACT_SCALAR(raw_data, '$.status')) AS status,
TIMESTAMP(JSON_EXTRACT_SCALAR(raw_data, '$.created')) AS created_at,
ingested_at
FROM {{ source('bronze', 'stripe_payments') }}
{% if is_incremental() %}
WHERE ingested_at > (SELECT MAX(ingested_at) FROM {{ this }})
{% endif %}
),
enriched AS (
SELECT
r.*,
CASE
WHEN r.currency = 'USD' THEN r.amount
ELSE r.amount * COALESCE(fx.rate_to_usd, 1)
END AS amount_usd,
r.status IN ('succeeded', 'paid') AS is_successful
FROM raw r
LEFT JOIN {{ ref('stg_exchange_rates') }} fx
ON r.currency = fx.currency_code
AND DATE(r.created_at) = fx.rate_date
)
SELECT * FROM enrichedWhen to Use Which Pattern
The choice between ETL and ELT depends on several factors.
Use ELT when the warehouse has sufficient compute capacity, transformations are expressible in SQL, multiple teams need to iterate on transformation logic independently, and you want full data lineage through a tool like dbt. Modern cloud warehouses like BigQuery, Snowflake, and Redshift are designed for this pattern.
Use ETL when transformations require non-SQL logic such as ML inference, NLP, or image processing. Also when source data volumes are massive and you need to filter or aggregate before loading to reduce storage costs. ETL is also appropriate when the target system has limited compute capacity or when real-time enrichment from external APIs is required during the transform step.
Use a hybrid approach when some data sources need ETL preprocessing while others can go straight to ELT. This is the reality for most organizations, including Klivvr.
Klivvr's Hybrid Approach
Klivvr's Data Platform uses ELT as the default pattern for structured data from operational databases and SaaS APIs. Raw data is loaded into the bronze layer with minimal transformation, and dbt handles all downstream modeling.
For specific use cases, we use ETL pipelines written in Python:
# Hybrid: Python ETL for complex preprocessing
# then dbt ELT for downstream modeling
class PaymentFraudEnricher:
"""ETL pipeline that enriches payments with ML fraud scores
before loading to the warehouse."""
def __init__(self, model_client, warehouse):
self.model = model_client
self.warehouse = warehouse
def run(self, payments: list[dict]) -> int:
# Extract: already have raw payments
# Transform: ML enrichment that cannot run in SQL
enriched = []
for payment in payments:
fraud_score = self.model.predict_fraud_probability(
amount=payment["amount"],
currency=payment["currency"],
customer_history=payment.get("customer_history", {}),
device_fingerprint=payment.get("device_fingerprint")
)
payment["fraud_score"] = fraud_score
payment["fraud_flag"] = fraud_score > 0.85
enriched.append(payment)
# Load: enriched data goes to silver layer directly
self.warehouse.bulk_insert("silver.payments_fraud_enriched", enriched)
return len(enriched)The fraud-enriched data then becomes an input to downstream dbt models that join it with other silver tables to produce gold-layer analytics.
Performance Considerations
ELT pushes compute to the warehouse, which means warehouse costs scale with transformation complexity. Understanding cost implications is important:
-- Expensive: full table scan with complex joins
-- This runs in the warehouse and consumes compute credits
SELECT
p.payment_id,
p.amount,
c.segment,
COUNT(*) OVER (
PARTITION BY p.customer_id
ORDER BY p.created_at
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
) AS rolling_30_day_txn_count
FROM silver.payments p
JOIN silver.customers c ON p.customer_id = c.customer_id
WHERE p.created_at >= '2025-01-01';
-- Optimized: incremental processing reduces compute
-- Only process new data since last run
{{ config(materialized='incremental') }}
SELECT ...
FROM silver.payments p
{% if is_incremental() %}
WHERE p.created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}ETL compute costs are more predictable since they run on dedicated infrastructure. But ETL pipelines require more engineering effort to build, maintain, and scale.
Conclusion
ETL and ELT are not competing paradigms — they are complementary tools suited to different situations. ELT leverages the power of modern cloud warehouses to keep transformation logic in SQL, where it is versionable, testable, and accessible to analysts. ETL handles the cases where transformation requires capabilities beyond SQL. Klivvr's Data Platform defaults to ELT for simplicity and flexibility, falling back to Python ETL only when the use case demands it. The key is not choosing one pattern dogmatically but understanding the tradeoffs and applying each where it fits best.
Related Articles
Data-Driven Decision Making in Fintech
How a well-architected data platform enables better business decisions across product, finance, and operations — with practical examples of self-service analytics and data democratization at Klivvr.
Incremental Models in dbt: Processing Data Efficiently
A deep dive into dbt's incremental materialization strategy, covering when to use incremental models, how to implement them correctly, and how to avoid the common pitfalls that lead to data inconsistencies.
Data Quality Testing: Strategies and Implementation
A comprehensive guide to implementing data quality testing across the data pipeline, from schema validation and freshness checks to statistical anomaly detection and business rule enforcement.