Building a KYC Verification System

A comprehensive guide to architecting a robust KYC verification system in TypeScript, covering identity checks, document validation, and risk scoring pipelines.

technical8 min readBy Klivvr Engineering
Share:

Know Your Customer (KYC) verification sits at the heart of every financial services application. Regulators mandate it, customers expect it to be fast, and engineering teams must build it to be reliable, auditable, and extensible. At Klivvr, Oasis is the TypeScript service responsible for managing the entire customer onboarding lifecycle, and the KYC verification subsystem is its most critical component.

This article walks through the architecture of the KYC verification system we built inside Oasis. We cover the domain model, the verification pipeline, integration with external providers, and the strategies we use to handle failures gracefully. Whether you are building your first KYC system or looking to improve an existing one, these patterns should translate directly to your own stack.

Domain Modeling: Verification as a State Machine

The first design decision we made was to model each verification request as an explicit state machine. A customer's KYC journey passes through well-defined states, and transitions between those states are triggered by specific events. This makes the system predictable, testable, and easy to reason about.

enum VerificationStatus {
  PENDING = "PENDING",
  DOCUMENT_SUBMITTED = "DOCUMENT_SUBMITTED",
  DOCUMENT_VERIFIED = "DOCUMENT_VERIFIED",
  IDENTITY_CHECK_IN_PROGRESS = "IDENTITY_CHECK_IN_PROGRESS",
  IDENTITY_VERIFIED = "IDENTITY_VERIFIED",
  MANUALLY_REVIEWING = "MANUALLY_REVIEWING",
  APPROVED = "APPROVED",
  REJECTED = "REJECTED",
  EXPIRED = "EXPIRED",
}
 
interface VerificationRequest {
  id: string;
  customerId: string;
  status: VerificationStatus;
  documents: VerificationDocument[];
  identityCheckResult: IdentityCheckResult | null;
  riskScore: number | null;
  createdAt: Date;
  updatedAt: Date;
  expiresAt: Date;
}
 
interface VerificationDocument {
  id: string;
  type: DocumentType;
  fileReference: string;
  extractedData: Record<string, string> | null;
  validationResult: DocumentValidationResult | null;
  uploadedAt: Date;
}
 
type DocumentType = "PASSPORT" | "NATIONAL_ID" | "DRIVERS_LICENSE" | "UTILITY_BILL" | "BANK_STATEMENT";
 
interface DocumentValidationResult {
  isValid: boolean;
  confidence: number;
  issues: string[];
  extractedFields: Record<string, string>;
}

The state machine approach has two major advantages. First, every state transition is explicit, which means we can attach side effects (sending notifications, updating audit logs, triggering downstream checks) to specific transitions rather than scattering them throughout the codebase. Second, illegal transitions are caught at compile time or during validation, preventing the system from entering an inconsistent state.

const ALLOWED_TRANSITIONS: Record<VerificationStatus, VerificationStatus[]> = {
  [VerificationStatus.PENDING]: [VerificationStatus.DOCUMENT_SUBMITTED, VerificationStatus.EXPIRED],
  [VerificationStatus.DOCUMENT_SUBMITTED]: [VerificationStatus.DOCUMENT_VERIFIED, VerificationStatus.REJECTED],
  [VerificationStatus.DOCUMENT_VERIFIED]: [VerificationStatus.IDENTITY_CHECK_IN_PROGRESS],
  [VerificationStatus.IDENTITY_CHECK_IN_PROGRESS]: [
    VerificationStatus.IDENTITY_VERIFIED,
    VerificationStatus.MANUALLY_REVIEWING,
    VerificationStatus.REJECTED,
  ],
  [VerificationStatus.IDENTITY_VERIFIED]: [VerificationStatus.APPROVED, VerificationStatus.MANUALLY_REVIEWING],
  [VerificationStatus.MANUALLY_REVIEWING]: [VerificationStatus.APPROVED, VerificationStatus.REJECTED],
  [VerificationStatus.APPROVED]: [],
  [VerificationStatus.REJECTED]: [VerificationStatus.PENDING],
  [VerificationStatus.EXPIRED]: [VerificationStatus.PENDING],
};
 
function transitionStatus(current: VerificationStatus, next: VerificationStatus): void {
  const allowed = ALLOWED_TRANSITIONS[current];
  if (!allowed.includes(next)) {
    throw new InvalidTransitionError(
      `Cannot transition from ${current} to ${next}`
    );
  }
}

The Verification Pipeline

Once documents are submitted, they enter a multi-stage verification pipeline. Each stage is an independent processor that reads from a queue, performs its work, and writes the result back. This pipeline architecture means stages can be scaled independently, retried individually, and replaced without affecting the rest of the system.

interface VerificationStage {
  name: string;
  process(request: VerificationRequest): Promise<StageResult>;
}
 
interface StageResult {
  passed: boolean;
  data: Record<string, unknown>;
  issues: string[];
  nextStage: string | null;
}
 
class VerificationPipeline {
  private stages: Map<string, VerificationStage> = new Map();
 
  register(stage: VerificationStage): void {
    this.stages.set(stage.name, stage);
  }
 
  async execute(request: VerificationRequest, startStage: string): Promise<PipelineResult> {
    const results: StageResult[] = [];
    let currentStage: string | null = startStage;
 
    while (currentStage) {
      const stage = this.stages.get(currentStage);
      if (!stage) {
        throw new Error(`Unknown verification stage: ${currentStage}`);
      }
 
      const result = await this.executeWithRetry(stage, request);
      results.push(result);
 
      if (!result.passed) {
        return { approved: false, results, failedAt: currentStage };
      }
 
      currentStage = result.nextStage;
    }
 
    return { approved: true, results, failedAt: null };
  }
 
  private async executeWithRetry(
    stage: VerificationStage,
    request: VerificationRequest,
    maxRetries = 3
  ): Promise<StageResult> {
    let lastError: Error | null = null;
 
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        return await stage.process(request);
      } catch (error) {
        lastError = error as Error;
        if (attempt < maxRetries) {
          await this.delay(Math.pow(2, attempt) * 1000);
        }
      }
    }
 
    throw new StageExecutionError(stage.name, lastError!);
  }
 
  private delay(ms: number): Promise<void> {
    return new Promise((resolve) => setTimeout(resolve, ms));
  }
}

The pipeline in Oasis consists of four stages: document authenticity validation, data extraction and cross-referencing, identity verification against external databases, and risk scoring. Each stage is implemented as a class conforming to the VerificationStage interface, and the pipeline orchestrator drives the request through them sequentially.

Document Validation and Data Extraction

The document validation stage checks whether a submitted document is authentic and extracts structured data from it. We delegate the heavy lifting to specialized third-party OCR and document verification services, but the orchestration and decision logic lives entirely within Oasis.

class DocumentValidationStage implements VerificationStage {
  name = "document-validation";
 
  constructor(
    private ocrService: OCRService,
    private fraudDetector: FraudDetectionService,
    private documentStore: DocumentStore
  ) {}
 
  async process(request: VerificationRequest): Promise<StageResult> {
    const issues: string[] = [];
    const extractedData: Record<string, string> = {};
 
    for (const doc of request.documents) {
      const imageBuffer = await this.documentStore.fetch(doc.fileReference);
 
      const ocrResult = await this.ocrService.extract(imageBuffer, doc.type);
      if (ocrResult.confidence < 0.85) {
        issues.push(`Low OCR confidence (${ocrResult.confidence}) for ${doc.type}`);
      }
 
      Object.assign(extractedData, ocrResult.fields);
 
      const fraudResult = await this.fraudDetector.analyze(imageBuffer);
      if (fraudResult.isSuspicious) {
        issues.push(`Fraud detection flagged ${doc.type}: ${fraudResult.reason}`);
      }
    }
 
    const crossRefValid = this.crossReferenceFields(extractedData, request);
 
    const passed = issues.length === 0 && crossRefValid;
 
    return {
      passed,
      data: { extractedData },
      issues,
      nextStage: passed ? "identity-verification" : null,
    };
  }
 
  private crossReferenceFields(
    extracted: Record<string, string>,
    request: VerificationRequest
  ): boolean {
    const nameMatch = this.fuzzyMatch(
      extracted["full_name"],
      request.customerProvidedName
    );
    const dobMatch = extracted["date_of_birth"] === request.customerProvidedDOB;
 
    return nameMatch && dobMatch;
  }
 
  private fuzzyMatch(a: string, b: string): boolean {
    const normalize = (s: string) => s.toLowerCase().replace(/\s+/g, " ").trim();
    return normalize(a) === normalize(b);
  }
}

One important lesson we learned is that fuzzy matching on names is essential. Customers frequently enter their name slightly differently from how it appears on their ID (middle names omitted, diacritics missing, different transliterations). A rigid exact-match check leads to an unacceptable rejection rate for legitimate customers.

Risk Scoring and Decision Engine

After document validation and identity verification pass, the request enters the risk scoring stage. This stage aggregates signals from all previous stages, external watchlist checks, and behavioral data to produce a numeric risk score. The score determines whether the customer is automatically approved, automatically rejected, or routed to manual review.

interface RiskSignal {
  source: string;
  weight: number;
  score: number;
  details: string;
}
 
class RiskScoringEngine {
  private thresholds = {
    autoApprove: 20,
    manualReview: 60,
    autoReject: 80,
  };
 
  evaluate(signals: RiskSignal[]): RiskDecision {
    const weightedScore = signals.reduce((sum, signal) => {
      return sum + signal.score * signal.weight;
    }, 0);
 
    const totalWeight = signals.reduce((sum, signal) => sum + signal.weight, 0);
    const normalizedScore = totalWeight > 0 ? weightedScore / totalWeight : 0;
 
    let decision: "APPROVE" | "MANUAL_REVIEW" | "REJECT";
 
    if (normalizedScore <= this.thresholds.autoApprove) {
      decision = "APPROVE";
    } else if (normalizedScore <= this.thresholds.manualReview) {
      decision = "MANUAL_REVIEW";
    } else {
      decision = "REJECT";
    }
 
    return {
      score: normalizedScore,
      decision,
      signals,
      evaluatedAt: new Date(),
    };
  }
}
 
class RiskScoringStage implements VerificationStage {
  name = "risk-scoring";
 
  constructor(
    private riskEngine: RiskScoringEngine,
    private watchlistService: WatchlistService,
    private geoService: GeoRiskService
  ) {}
 
  async process(request: VerificationRequest): Promise<StageResult> {
    const signals: RiskSignal[] = [];
 
    const watchlistResult = await this.watchlistService.screen(request.customerId);
    signals.push({
      source: "watchlist",
      weight: 3.0,
      score: watchlistResult.matchFound ? 100 : 0,
      details: watchlistResult.matchFound
        ? `Potential match: ${watchlistResult.matchDetails}`
        : "No watchlist matches",
    });
 
    const geoRisk = await this.geoService.assessRisk(request.customerCountry);
    signals.push({
      source: "geo-risk",
      weight: 1.5,
      score: geoRisk.riskLevel,
      details: `Country risk level: ${geoRisk.category}`,
    });
 
    const decision = this.riskEngine.evaluate(signals);
 
    return {
      passed: decision.decision !== "REJECT",
      data: { riskScore: decision.score, decision: decision.decision },
      issues: decision.decision === "REJECT" ? ["Risk score exceeded threshold"] : [],
      nextStage: null,
    };
  }
}

The thresholds are configurable and tuned based on historical data. We review false positive and false negative rates monthly and adjust the weights and thresholds accordingly. The key principle is that the risk engine should be aggressive about routing ambiguous cases to manual review rather than auto-rejecting. A rejected customer is a lost customer, but a customer routed to manual review is merely delayed.

Audit Trail and Compliance Logging

Every KYC system must produce an immutable audit trail. Regulators can and do request evidence of the decisions made during verification, and the system must be able to reconstruct exactly what happened, when, and why.

interface AuditEntry {
  id: string;
  verificationRequestId: string;
  action: string;
  actor: string;
  timestamp: Date;
  previousState: string;
  newState: string;
  metadata: Record<string, unknown>;
}
 
class AuditLogger {
  constructor(private store: AuditStore) {}
 
  async logTransition(
    requestId: string,
    from: VerificationStatus,
    to: VerificationStatus,
    actor: string,
    metadata: Record<string, unknown> = {}
  ): Promise<void> {
    const entry: AuditEntry = {
      id: generateUUID(),
      verificationRequestId: requestId,
      action: `TRANSITION_${from}_TO_${to}`,
      actor,
      timestamp: new Date(),
      previousState: from,
      newState: to,
      metadata,
    };
 
    await this.store.append(entry);
  }
 
  async getHistory(requestId: string): Promise<AuditEntry[]> {
    return this.store.findByRequestId(requestId);
  }
}

We store audit entries in an append-only table with no UPDATE or DELETE permissions granted to the application user. This guarantees immutability at the database level. Each entry captures the previous state, the new state, the actor (system or human reviewer), and a metadata blob that includes relevant context such as risk scores, matched watchlist entries, or reviewer notes.

Conclusion

Building a KYC verification system is as much about process design as it is about code. The state machine ensures that every verification request follows a predictable path. The pipeline architecture provides isolation, retry capability, and scalability. The risk scoring engine encapsulates complex decision logic in a testable, tunable component. And the audit trail satisfies regulatory requirements while providing invaluable debugging information.

The patterns described here have served Oasis well as Klivvr has scaled to handle thousands of verification requests daily. The most important lesson is to design for the unhappy path: documents will fail validation, external services will be unavailable, and edge cases will surprise you. A system that handles these situations gracefully, with clear error messages, automatic retries, and a path to human review, is a system that earns customer trust while satisfying compliance requirements.

Related Articles

technical

Integrating Third-Party Verification APIs

Practical strategies for integrating third-party identity verification APIs, covering adapter patterns, error handling, rate limiting, and provider management in TypeScript.

10 min read
business

Data Privacy in Customer Onboarding

Strategies for protecting customer data during the onboarding process, covering data minimization, encryption, consent management, and regulatory compliance.

9 min read