Modern Data Warehouse Architecture for Fintech

The data warehouse is the backbone of any data-driven organization. It is where raw operational data is transformed into structured, queryable datasets that power dashboards, reports, and machine learning models. But not all warehouse architectures are equal. A poorly designed warehouse becomes a bottleneck — slow queries, duplicated logic, and brittle dependencies that break with every schema change upstream.

This article covers the architecture patterns we use in Klivvr's Data Platform, from the medallion architecture that organizes data by quality tier to the dimensional modeling techniques that make analytical queries fast and intuitive.

The Medallion Architecture

The medallion architecture organizes data into three tiers — bronze, silver, and gold — based on the level of refinement. Each tier has a clear purpose and contract with its consumers.

The bronze layer is the raw landing zone. Data arrives here exactly as it was extracted from source systems — no transformations, no deduplication, no type casting. This layer serves as the system of record and enables full reprocessing if upstream logic changes.

-- Bronze layer: raw ingestion with metadata
CREATE TABLE bronze.stripe_payments (
    _raw_data JSONB NOT NULL,
    _source_system VARCHAR(50) DEFAULT 'stripe',
    _ingested_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    _batch_id VARCHAR(100),
    _file_name VARCHAR(255)
);
 
-- Bronze layer: append-only pattern
INSERT INTO bronze.stripe_payments (_raw_data, _batch_id, _file_name)
SELECT
    raw_json,
    '{{ batch_id }}',
    '{{ file_name }}'
FROM staging_area.incoming_stripe_data;

The silver layer applies cleaning, deduplication, and standardization. This is where business entities take shape. Silver tables have defined schemas, enforced types, and referential integrity checks.

-- Silver layer: cleaned and typed
CREATE TABLE silver.payments AS
SELECT
    (_raw_data->>'id')::VARCHAR AS payment_id,
    (_raw_data->>'customer')::VARCHAR AS customer_id,
    (_raw_data->>'amount')::DECIMAL(18,2) / 100 AS amount,
    (_raw_data->>'currency')::VARCHAR(3) AS currency,
    LOWER(_raw_data->>'status') AS status,
    (_raw_data->>'created')::TIMESTAMP AS created_at,
    (_raw_data->>'metadata'->>'order_id')::VARCHAR AS order_id,
    _ingested_at AS loaded_at,
    -- Deduplication: keep the latest version of each payment
    ROW_NUMBER() OVER (
        PARTITION BY (_raw_data->>'id')
        ORDER BY _ingested_at DESC
    ) AS _row_num
FROM bronze.stripe_payments;
 
-- Only keep the latest version
DELETE FROM silver.payments WHERE _row_num > 1;
ALTER TABLE silver.payments DROP COLUMN _row_num;

The gold layer is optimized for consumption. Tables here are denormalized, pre-aggregated, and designed for specific use cases — dashboards, reports, or downstream applications.

-- Gold layer: business-ready aggregation
CREATE TABLE gold.daily_payment_summary AS
SELECT
    p.created_at::DATE AS payment_date,
    p.currency,
    c.country_code,
    c.customer_segment,
    COUNT(*) AS total_transactions,
    COUNT(DISTINCT p.customer_id) AS unique_customers,
    SUM(CASE WHEN p.status = 'succeeded' THEN p.amount ELSE 0 END) AS gross_revenue,
    SUM(CASE WHEN p.status = 'refunded' THEN p.amount ELSE 0 END) AS refund_amount,
    AVG(CASE WHEN p.status = 'succeeded' THEN p.amount END) AS avg_transaction_value
FROM silver.payments p
JOIN silver.customers c ON p.customer_id = c.customer_id
GROUP BY 1, 2, 3, 4;

Dimensional Modeling

Within the gold layer, we use dimensional modeling to structure data for analytical queries. The star schema separates data into fact tables (measurable events) and dimension tables (descriptive context).

-- Fact table: transaction events
CREATE TABLE gold.fct_transactions (
    transaction_key BIGINT GENERATED ALWAYS AS IDENTITY,
    transaction_id VARCHAR(100) NOT NULL,
    customer_key BIGINT REFERENCES gold.dim_customers(customer_key),
    product_key BIGINT REFERENCES gold.dim_products(product_key),
    date_key INT REFERENCES gold.dim_date(date_key),
    amount DECIMAL(18,2),
    fee_amount DECIMAL(18,2),
    net_amount DECIMAL(18,2),
    currency VARCHAR(3),
    status VARCHAR(20),
    channel VARCHAR(50),
    created_at TIMESTAMP
);
 
-- Dimension table: customer attributes
CREATE TABLE gold.dim_customers (
    customer_key BIGINT GENERATED ALWAYS AS IDENTITY,
    customer_id VARCHAR(100) NOT NULL,
    customer_name VARCHAR(255),
    email VARCHAR(255),
    country_code VARCHAR(2),
    segment VARCHAR(50),
    tier VARCHAR(20),
    kyc_status VARCHAR(20),
    created_at TIMESTAMP,
    -- SCD Type 2 fields
    effective_from TIMESTAMP NOT NULL,
    effective_to TIMESTAMP DEFAULT '9999-12-31',
    is_current BOOLEAN DEFAULT TRUE
);
 
-- Dimension table: date for time-based analysis
CREATE TABLE gold.dim_date (
    date_key INT PRIMARY KEY,  -- YYYYMMDD format
    full_date DATE NOT NULL,
    day_of_week INT,
    day_name VARCHAR(10),
    week_of_year INT,
    month_number INT,
    month_name VARCHAR(10),
    quarter INT,
    year INT,
    is_weekend BOOLEAN,
    is_business_day BOOLEAN
);

Slowly Changing Dimensions (SCD Type 2) track historical changes. When a customer changes their segment from "standard" to "premium," the old record is closed and a new one is created. This allows analysts to correctly attribute historical transactions to the customer's segment at the time of the transaction.

-- SCD Type 2 merge for customer dimension
MERGE INTO gold.dim_customers target
USING silver.customers source
ON target.customer_id = source.customer_id AND target.is_current = TRUE
WHEN MATCHED AND (
    target.segment != source.segment
    OR target.tier != source.tier
    OR target.kyc_status != source.kyc_status
) THEN UPDATE SET
    effective_to = CURRENT_TIMESTAMP,
    is_current = FALSE
WHEN NOT MATCHED THEN INSERT (
    customer_id, customer_name, email, country_code,
    segment, tier, kyc_status, created_at,
    effective_from, is_current
) VALUES (
    source.customer_id, source.customer_name, source.email,
    source.country_code, source.segment, source.tier,
    source.kyc_status, source.created_at,
    CURRENT_TIMESTAMP, TRUE
);

Query Optimization

A well-designed warehouse is useless if queries are slow. We apply several optimization strategies to keep query performance within acceptable bounds as data volumes grow.

Partitioning divides large tables into smaller, more manageable segments. For time-series data like transactions, date-based partitioning is the natural choice:

-- Partitioned fact table
CREATE TABLE gold.fct_transactions (
    transaction_key BIGINT,
    transaction_id VARCHAR(100),
    customer_key BIGINT,
    amount DECIMAL(18,2),
    created_at TIMESTAMP
) PARTITION BY RANGE (created_at);
 
-- Monthly partitions
CREATE TABLE gold.fct_transactions_2025_01
    PARTITION OF gold.fct_transactions
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
 
CREATE TABLE gold.fct_transactions_2025_02
    PARTITION OF gold.fct_transactions
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

Materialized views pre-compute expensive aggregations that are queried frequently:

-- Materialized view for common dashboard query
CREATE MATERIALIZED VIEW gold.mv_monthly_revenue AS
SELECT
    DATE_TRUNC('month', created_at) AS month,
    currency,
    segment,
    SUM(amount) AS total_revenue,
    COUNT(*) AS transaction_count,
    COUNT(DISTINCT customer_key) AS unique_customers
FROM gold.fct_transactions ft
JOIN gold.dim_customers dc ON ft.customer_key = dc.customer_key
WHERE dc.is_current = TRUE
GROUP BY 1, 2, 3;
 
-- Refresh on schedule
REFRESH MATERIALIZED VIEW CONCURRENTLY gold.mv_monthly_revenue;

Conclusion

A modern data warehouse architecture is not just about storing data — it is about organizing it so that the right people can find, trust, and use it efficiently. The medallion architecture provides clear data quality boundaries. Dimensional modeling makes analytical queries intuitive. And optimization techniques like partitioning and materialized views keep performance manageable as data grows. In Klivvr's Data Platform, these patterns work together to ensure that every team, from product to finance to compliance, has reliable access to the data they need to make decisions.

Modern Data Warehouse Architecture for Fintech

The Medallion Architecture

Dimensional Modeling

Query Optimization

Conclusion

Related Articles

Data-Driven Decision Making in Fintech

Incremental Models in dbt: Processing Data Efficiently

Data Quality Testing: Strategies and Implementation

Related Articles

Data-Driven Decision Making in Fintech
November 20, 20256 min read

Incremental Models in dbt: Processing Data Efficiently
October 30, 20258 min read

Data Quality Testing: Strategies and Implementation
October 28, 20258 min read