Modern Data Warehouse Architecture for Fintech
An exploration of modern data warehouse architecture patterns including the medallion architecture, dimensional modeling, and how Klivvr's Data Platform structures its warehouse for scalability and reliability.
The data warehouse is the backbone of any data-driven organization. It is where raw operational data is transformed into structured, queryable datasets that power dashboards, reports, and machine learning models. But not all warehouse architectures are equal. A poorly designed warehouse becomes a bottleneck — slow queries, duplicated logic, and brittle dependencies that break with every schema change upstream.
This article covers the architecture patterns we use in Klivvr's Data Platform, from the medallion architecture that organizes data by quality tier to the dimensional modeling techniques that make analytical queries fast and intuitive.
The Medallion Architecture
The medallion architecture organizes data into three tiers — bronze, silver, and gold — based on the level of refinement. Each tier has a clear purpose and contract with its consumers.
The bronze layer is the raw landing zone. Data arrives here exactly as it was extracted from source systems — no transformations, no deduplication, no type casting. This layer serves as the system of record and enables full reprocessing if upstream logic changes.
-- Bronze layer: raw ingestion with metadata
CREATE TABLE bronze.stripe_payments (
_raw_data JSONB NOT NULL,
_source_system VARCHAR(50) DEFAULT 'stripe',
_ingested_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
_batch_id VARCHAR(100),
_file_name VARCHAR(255)
);
-- Bronze layer: append-only pattern
INSERT INTO bronze.stripe_payments (_raw_data, _batch_id, _file_name)
SELECT
raw_json,
'{{ batch_id }}',
'{{ file_name }}'
FROM staging_area.incoming_stripe_data;The silver layer applies cleaning, deduplication, and standardization. This is where business entities take shape. Silver tables have defined schemas, enforced types, and referential integrity checks.
-- Silver layer: cleaned and typed
CREATE TABLE silver.payments AS
SELECT
(_raw_data->>'id')::VARCHAR AS payment_id,
(_raw_data->>'customer')::VARCHAR AS customer_id,
(_raw_data->>'amount')::DECIMAL(18,2) / 100 AS amount,
(_raw_data->>'currency')::VARCHAR(3) AS currency,
LOWER(_raw_data->>'status') AS status,
(_raw_data->>'created')::TIMESTAMP AS created_at,
(_raw_data->>'metadata'->>'order_id')::VARCHAR AS order_id,
_ingested_at AS loaded_at,
-- Deduplication: keep the latest version of each payment
ROW_NUMBER() OVER (
PARTITION BY (_raw_data->>'id')
ORDER BY _ingested_at DESC
) AS _row_num
FROM bronze.stripe_payments;
-- Only keep the latest version
DELETE FROM silver.payments WHERE _row_num > 1;
ALTER TABLE silver.payments DROP COLUMN _row_num;The gold layer is optimized for consumption. Tables here are denormalized, pre-aggregated, and designed for specific use cases — dashboards, reports, or downstream applications.
-- Gold layer: business-ready aggregation
CREATE TABLE gold.daily_payment_summary AS
SELECT
p.created_at::DATE AS payment_date,
p.currency,
c.country_code,
c.customer_segment,
COUNT(*) AS total_transactions,
COUNT(DISTINCT p.customer_id) AS unique_customers,
SUM(CASE WHEN p.status = 'succeeded' THEN p.amount ELSE 0 END) AS gross_revenue,
SUM(CASE WHEN p.status = 'refunded' THEN p.amount ELSE 0 END) AS refund_amount,
AVG(CASE WHEN p.status = 'succeeded' THEN p.amount END) AS avg_transaction_value
FROM silver.payments p
JOIN silver.customers c ON p.customer_id = c.customer_id
GROUP BY 1, 2, 3, 4;Dimensional Modeling
Within the gold layer, we use dimensional modeling to structure data for analytical queries. The star schema separates data into fact tables (measurable events) and dimension tables (descriptive context).
-- Fact table: transaction events
CREATE TABLE gold.fct_transactions (
transaction_key BIGINT GENERATED ALWAYS AS IDENTITY,
transaction_id VARCHAR(100) NOT NULL,
customer_key BIGINT REFERENCES gold.dim_customers(customer_key),
product_key BIGINT REFERENCES gold.dim_products(product_key),
date_key INT REFERENCES gold.dim_date(date_key),
amount DECIMAL(18,2),
fee_amount DECIMAL(18,2),
net_amount DECIMAL(18,2),
currency VARCHAR(3),
status VARCHAR(20),
channel VARCHAR(50),
created_at TIMESTAMP
);
-- Dimension table: customer attributes
CREATE TABLE gold.dim_customers (
customer_key BIGINT GENERATED ALWAYS AS IDENTITY,
customer_id VARCHAR(100) NOT NULL,
customer_name VARCHAR(255),
email VARCHAR(255),
country_code VARCHAR(2),
segment VARCHAR(50),
tier VARCHAR(20),
kyc_status VARCHAR(20),
created_at TIMESTAMP,
-- SCD Type 2 fields
effective_from TIMESTAMP NOT NULL,
effective_to TIMESTAMP DEFAULT '9999-12-31',
is_current BOOLEAN DEFAULT TRUE
);
-- Dimension table: date for time-based analysis
CREATE TABLE gold.dim_date (
date_key INT PRIMARY KEY, -- YYYYMMDD format
full_date DATE NOT NULL,
day_of_week INT,
day_name VARCHAR(10),
week_of_year INT,
month_number INT,
month_name VARCHAR(10),
quarter INT,
year INT,
is_weekend BOOLEAN,
is_business_day BOOLEAN
);Slowly Changing Dimensions (SCD Type 2) track historical changes. When a customer changes their segment from "standard" to "premium," the old record is closed and a new one is created. This allows analysts to correctly attribute historical transactions to the customer's segment at the time of the transaction.
-- SCD Type 2 merge for customer dimension
MERGE INTO gold.dim_customers target
USING silver.customers source
ON target.customer_id = source.customer_id AND target.is_current = TRUE
WHEN MATCHED AND (
target.segment != source.segment
OR target.tier != source.tier
OR target.kyc_status != source.kyc_status
) THEN UPDATE SET
effective_to = CURRENT_TIMESTAMP,
is_current = FALSE
WHEN NOT MATCHED THEN INSERT (
customer_id, customer_name, email, country_code,
segment, tier, kyc_status, created_at,
effective_from, is_current
) VALUES (
source.customer_id, source.customer_name, source.email,
source.country_code, source.segment, source.tier,
source.kyc_status, source.created_at,
CURRENT_TIMESTAMP, TRUE
);Query Optimization
A well-designed warehouse is useless if queries are slow. We apply several optimization strategies to keep query performance within acceptable bounds as data volumes grow.
Partitioning divides large tables into smaller, more manageable segments. For time-series data like transactions, date-based partitioning is the natural choice:
-- Partitioned fact table
CREATE TABLE gold.fct_transactions (
transaction_key BIGINT,
transaction_id VARCHAR(100),
customer_key BIGINT,
amount DECIMAL(18,2),
created_at TIMESTAMP
) PARTITION BY RANGE (created_at);
-- Monthly partitions
CREATE TABLE gold.fct_transactions_2025_01
PARTITION OF gold.fct_transactions
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE gold.fct_transactions_2025_02
PARTITION OF gold.fct_transactions
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');Materialized views pre-compute expensive aggregations that are queried frequently:
-- Materialized view for common dashboard query
CREATE MATERIALIZED VIEW gold.mv_monthly_revenue AS
SELECT
DATE_TRUNC('month', created_at) AS month,
currency,
segment,
SUM(amount) AS total_revenue,
COUNT(*) AS transaction_count,
COUNT(DISTINCT customer_key) AS unique_customers
FROM gold.fct_transactions ft
JOIN gold.dim_customers dc ON ft.customer_key = dc.customer_key
WHERE dc.is_current = TRUE
GROUP BY 1, 2, 3;
-- Refresh on schedule
REFRESH MATERIALIZED VIEW CONCURRENTLY gold.mv_monthly_revenue;Conclusion
A modern data warehouse architecture is not just about storing data — it is about organizing it so that the right people can find, trust, and use it efficiently. The medallion architecture provides clear data quality boundaries. Dimensional modeling makes analytical queries intuitive. And optimization techniques like partitioning and materialized views keep performance manageable as data grows. In Klivvr's Data Platform, these patterns work together to ensure that every team, from product to finance to compliance, has reliable access to the data they need to make decisions.
Related Articles
Data-Driven Decision Making in Fintech
How a well-architected data platform enables better business decisions across product, finance, and operations — with practical examples of self-service analytics and data democratization at Klivvr.
Incremental Models in dbt: Processing Data Efficiently
A deep dive into dbt's incremental materialization strategy, covering when to use incremental models, how to implement them correctly, and how to avoid the common pitfalls that lead to data inconsistencies.
Data Quality Testing: Strategies and Implementation
A comprehensive guide to implementing data quality testing across the data pipeline, from schema validation and freshness checks to statistical anomaly detection and business rule enforcement.