Customer Segmentation with TypeScript and Analytics
A practical guide to building scalable customer segmentation systems in TypeScript, covering data pipelines, clustering algorithms, and real-time segment assignment for CRM platforms.
Customer segmentation is the foundation of every meaningful CRM strategy. Without it, marketing campaigns are broadcast noise, product recommendations are irrelevant, and support teams treat a high-value enterprise client the same way they treat a trial user who signed up yesterday. At Klivvr, CVM Nova was built with segmentation as a first-class concept — not an afterthought bolted onto a contact database, but a core data primitive that flows through every layer of the platform.
This article walks through the architecture and implementation of a TypeScript-based customer segmentation system. We cover data modeling, algorithmic approaches, real-time segment assignment, and the practical patterns that make segmentation useful beyond slide decks.
Defining Segments: Schema and Data Model
Before writing a single clustering algorithm, you need a clear data model for what a segment is. In CVM Nova, a segment is a named, versioned rule set that partitions customers into groups. Segments can be rule-based (deterministic) or model-based (probabilistic), and they carry metadata that makes them auditable and composable.
interface SegmentDefinition {
id: string;
name: string;
version: number;
type: "rule-based" | "model-based";
createdAt: Date;
updatedAt: Date;
rules?: SegmentRule[];
modelRef?: string;
description: string;
}
interface SegmentRule {
field: string;
operator: "eq" | "neq" | "gt" | "gte" | "lt" | "lte" | "in" | "contains";
value: unknown;
logicalGroup: "and" | "or";
}
interface CustomerSegmentAssignment {
customerId: string;
segmentId: string;
segmentVersion: number;
assignedAt: Date;
confidence: number; // 1.0 for rule-based, 0-1 for model-based
expiresAt?: Date;
}The SegmentDefinition type captures both simple rule-based segments (customers in Egypt with more than 10 transactions) and model-based segments (customers likely to churn within 30 days). The CustomerSegmentAssignment type records which customer belongs to which segment, along with a confidence score and an optional expiration. Expiration matters because model-based segments go stale — a churn prediction from three months ago is not actionable today.
Versioning is critical. When a segment definition changes, you need to know which version of the rules produced a given assignment. Without versioning, you cannot debug why a customer received a particular campaign or why a cohort analysis looks wrong.
Rule-Based Segmentation Engine
Rule-based segments are the workhorse of day-to-day CRM operations. Business users define them through a UI, and the system evaluates them against customer profiles. The engine needs to be fast, composable, and easy to test.
type CustomerProfile = Record<string, unknown>;
function evaluateRule(profile: CustomerProfile, rule: SegmentRule): boolean {
const fieldValue = profile[rule.field];
switch (rule.operator) {
case "eq":
return fieldValue === rule.value;
case "neq":
return fieldValue !== rule.value;
case "gt":
return typeof fieldValue === "number" && fieldValue > (rule.value as number);
case "gte":
return typeof fieldValue === "number" && fieldValue >= (rule.value as number);
case "lt":
return typeof fieldValue === "number" && fieldValue < (rule.value as number);
case "lte":
return typeof fieldValue === "number" && fieldValue <= (rule.value as number);
case "in":
return Array.isArray(rule.value) && rule.value.includes(fieldValue);
case "contains":
return typeof fieldValue === "string" && fieldValue.includes(rule.value as string);
default:
return false;
}
}
function evaluateSegment(profile: CustomerProfile, segment: SegmentDefinition): boolean {
if (!segment.rules || segment.rules.length === 0) return false;
const andGroups = segment.rules.filter((r) => r.logicalGroup === "and");
const orGroups = segment.rules.filter((r) => r.logicalGroup === "or");
const andResult = andGroups.every((rule) => evaluateRule(profile, rule));
const orResult = orGroups.length === 0 || orGroups.some((rule) => evaluateRule(profile, rule));
return andResult && orResult;
}This engine is deliberately simple. Each rule evaluates a single field against a single value. Logical grouping handles conjunction and disjunction. For more complex conditions — nested groups, cross-field comparisons, temporal windows — we compose segments rather than complicating the rule grammar. A "high-value active customer" segment is the intersection of a "high-value" segment and an "active in last 30 days" segment, not a single segment with intertwined rules.
The composability principle keeps individual segments understandable and testable. Business users can reason about "high-value" independently from "active," and engineers can unit-test each segment in isolation.
Model-Based Segmentation with Clustering
Rule-based segments capture what you already know. Model-based segments reveal what you do not. Clustering algorithms group customers by behavioral similarity, surfacing natural segments that no human would have defined manually.
In CVM Nova, we run clustering as a batch job that reads customer feature vectors from the data warehouse and writes segment assignments back to the platform. The TypeScript implementation uses a k-means algorithm adapted for the high-dimensional feature spaces typical of CRM data.
interface FeatureVector {
customerId: string;
features: number[];
}
function kMeans(
data: FeatureVector[],
k: number,
maxIterations: number = 100
): Map<string, number> {
const dim = data[0].features.length;
let centroids = initializeCentroids(data, k);
let assignments = new Map<string, number>();
for (let iter = 0; iter < maxIterations; iter++) {
const newAssignments = new Map<string, number>();
for (const point of data) {
let minDist = Infinity;
let closest = 0;
for (let c = 0; c < k; c++) {
const dist = euclideanDistance(point.features, centroids[c]);
if (dist < minDist) {
minDist = dist;
closest = c;
}
}
newAssignments.set(point.customerId, closest);
}
// Recompute centroids
const sums: number[][] = Array.from({ length: k }, () => new Array(dim).fill(0));
const counts = new Array(k).fill(0);
for (const point of data) {
const cluster = newAssignments.get(point.customerId)!;
counts[cluster]++;
for (let d = 0; d < dim; d++) {
sums[cluster][d] += point.features[d];
}
}
centroids = sums.map((sum, i) =>
sum.map((val) => (counts[i] > 0 ? val / counts[i] : 0))
);
if (assignmentsEqual(assignments, newAssignments)) break;
assignments = newAssignments;
}
return assignments;
}
function euclideanDistance(a: number[], b: number[]): number {
let sum = 0;
for (let i = 0; i < a.length; i++) {
sum += (a[i] - b[i]) ** 2;
}
return Math.sqrt(sum);
}
function initializeCentroids(data: FeatureVector[], k: number): number[][] {
const shuffled = [...data].sort(() => Math.random() - 0.5);
return shuffled.slice(0, k).map((p) => [...p.features]);
}
function assignmentsEqual(a: Map<string, number>, b: Map<string, number>): boolean {
if (a.size !== b.size) return false;
for (const [key, val] of a) {
if (b.get(key) !== val) return false;
}
return true;
}Feature engineering matters more than the algorithm. The features we typically include are: transaction frequency over multiple time windows (7, 30, 90 days), average transaction value, product mix (how many distinct product categories the customer uses), recency of last interaction, support ticket frequency, and demographic attributes where available. Each feature is normalized to a 0-1 range before clustering to prevent high-magnitude features from dominating the distance calculation.
The output of clustering is a set of unlabeled groups. The next step — and the most important one — is interpretation. Data scientists review the centroid values for each cluster and assign human-readable labels: "dormant savers," "active transactors," "high-value engaged." These labels become the segment names in CVM Nova, and business users interact with them as if they were rule-based segments.
Real-Time Segment Assignment
Batch segmentation runs on a schedule — nightly or hourly. But some use cases demand real-time segment updates. When a customer completes a large transaction, they might cross a threshold that moves them from "standard" to "premium." Waiting until the next batch run means missing the opportunity to offer them a premium welcome experience immediately.
CVM Nova handles real-time segmentation through an event-driven pipeline. Customer events flow through a message broker, and a segment evaluation service processes each event against the active rule-based segments.
interface CustomerEvent {
customerId: string;
eventType: string;
payload: Record<string, unknown>;
timestamp: Date;
}
class RealTimeSegmentEvaluator {
private segments: SegmentDefinition[];
private profileCache: Map<string, CustomerProfile>;
constructor(segments: SegmentDefinition[], profileCache: Map<string, CustomerProfile>) {
this.segments = segments;
this.profileCache = profileCache;
}
async handleEvent(event: CustomerEvent): Promise<CustomerSegmentAssignment[]> {
const profile = await this.getUpdatedProfile(event);
const newAssignments: CustomerSegmentAssignment[] = [];
for (const segment of this.segments) {
if (segment.type !== "rule-based") continue;
const matches = evaluateSegment(profile, segment);
if (matches) {
newAssignments.push({
customerId: event.customerId,
segmentId: segment.id,
segmentVersion: segment.version,
assignedAt: new Date(),
confidence: 1.0,
});
}
}
return newAssignments;
}
private async getUpdatedProfile(event: CustomerEvent): Promise<CustomerProfile> {
const existing = this.profileCache.get(event.customerId) ?? {};
const updated = applyEventToProfile(existing, event);
this.profileCache.set(event.customerId, updated);
return updated;
}
}
function applyEventToProfile(
profile: CustomerProfile,
event: CustomerEvent
): CustomerProfile {
const updated = { ...profile };
if (event.eventType === "transaction.completed") {
const currentCount = (updated["transactionCount"] as number) ?? 0;
updated["transactionCount"] = currentCount + 1;
const currentTotal = (updated["totalTransactionValue"] as number) ?? 0;
updated["totalTransactionValue"] = currentTotal + (event.payload.amount as number);
}
updated["lastEventAt"] = event.timestamp;
return updated;
}The key design decision is maintaining a profile cache that reflects the latest state of each customer. Events are applied incrementally — a transaction event increments the count and total, a login event updates the last-active timestamp. The cache is periodically reconciled with the source-of-truth database to correct any drift.
Practical Tips for Production Segmentation
Building a segmentation engine is one thing. Running it reliably in production is another. Here are the patterns we have found most valuable in CVM Nova.
First, always store the segment version alongside the assignment. When a business user edits a segment definition, old assignments remain valid under the previous version. This prevents the "phantom segment" problem where customers appear to enter and leave segments erratically because the definition keeps changing under them.
Second, implement segment overlap detection. Customers frequently belong to multiple segments, and that is fine. But when two overlapping segments trigger conflicting campaigns — a discount offer and a premium upsell to the same customer — you need a priority system. CVM Nova assigns a priority rank to each segment, and the campaign engine uses the highest-priority segment when conflicts arise.
Third, build segment health dashboards. A segment that contains zero customers is likely misconfigured. A segment that contains 95% of your customer base is likely too broad to be useful. Monitoring segment population sizes and their change over time catches both issues early.
Conclusion
Customer segmentation is not a feature you ship once and forget. It is a living system that evolves with your business, your data, and your customers. The architecture described here — a clear data model, a composable rule engine, batch clustering for discovery, and real-time evaluation for responsiveness — provides the foundation for segmentation that is both powerful and maintainable.
The most important lesson from building CVM Nova's segmentation system is that simplicity compounds. A simple rule engine that business users actually use is worth more than a sophisticated ML pipeline that only data scientists understand. Start with rule-based segments, instrument them thoroughly, and layer on model-based segments when you have the data and the organizational maturity to interpret them. The goal is not to build the most technically impressive segmentation system — it is to build one that makes every customer interaction more relevant.
Related Articles
Real-Time Customer Profiles with Event Streaming
A technical guide to building real-time customer profile systems using event streaming in TypeScript, covering event-driven architecture, stream processing, profile materialization, and consistency guarantees.
Customer Engagement Metrics That Matter
A practical guide to defining, measuring, and acting on customer engagement metrics in CRM platforms, with a focus on metrics that drive retention and revenue in fintech.
Data-Driven CRM: Strategy and Implementation
A strategic guide to building and operating a data-driven CRM practice, covering organizational alignment, data governance, analytics maturity models, and practical implementation roadmaps.