WebSocket Architecture for Real-Time Messaging

How to design a WebSocket architecture for real-time messaging systems, covering connection management, heartbeat patterns, reconnection strategies, and scaling considerations used in Whisper.

technical6 min readBy Klivvr Engineering
Share:

Real-time communication requires a persistent, bidirectional channel between client and server. HTTP's request-response model does not support this — the server cannot push data to the client without the client asking first. WebSockets solve this by establishing a long-lived TCP connection that either side can use to send messages at any time.

Whisper, Klivvr's notification and messaging service, uses WebSockets as its primary real-time delivery channel. This article covers the architecture decisions behind Whisper's WebSocket layer, from connection lifecycle management to scaling patterns.

Connection Lifecycle

A WebSocket connection goes through four phases: handshake, authentication, active communication, and termination. Each phase needs explicit handling.

import { WebSocketServer, WebSocket } from "ws";
import { IncomingMessage } from "http";
 
interface ClientConnection {
  id: string;
  userId: string;
  socket: WebSocket;
  connectedAt: Date;
  lastPingAt: Date;
  subscriptions: Set<string>;
  metadata: Record<string, unknown>;
}
 
class WhisperWebSocketServer {
  private wss: WebSocketServer;
  private connections = new Map<string, ClientConnection>();
  private userConnections = new Map<string, Set<string>>();
 
  constructor(port: number) {
    this.wss = new WebSocketServer({ port });
    this.wss.on("connection", (socket, request) =>
      this.handleConnection(socket, request)
    );
  }
 
  private async handleConnection(
    socket: WebSocket,
    request: IncomingMessage
  ): Promise<void> {
    // Phase 1: Authentication
    const token = this.extractToken(request);
    if (!token) {
      socket.close(4001, "Authentication required");
      return;
    }
 
    const user = await this.authenticateToken(token);
    if (!user) {
      socket.close(4003, "Invalid token");
      return;
    }
 
    // Phase 2: Register connection
    const connectionId = crypto.randomUUID();
    const connection: ClientConnection = {
      id: connectionId,
      userId: user.id,
      socket,
      connectedAt: new Date(),
      lastPingAt: new Date(),
      subscriptions: new Set(),
      metadata: { deviceType: request.headers["x-device-type"] },
    };
 
    this.connections.set(connectionId, connection);
    this.addUserConnection(user.id, connectionId);
 
    // Phase 3: Setup message handling
    socket.on("message", (data) =>
      this.handleMessage(connection, data.toString())
    );
 
    socket.on("close", () => this.handleDisconnect(connection));
    socket.on("error", (error) => this.handleError(connection, error));
 
    // Send connection acknowledgment
    this.send(connection, {
      type: "connected",
      connectionId,
      serverTime: new Date().toISOString(),
    });
 
    // Deliver any queued messages
    await this.deliverPendingMessages(user.id, connection);
  }
 
  private send(connection: ClientConnection, data: unknown): void {
    if (connection.socket.readyState === WebSocket.OPEN) {
      connection.socket.send(JSON.stringify(data));
    }
  }
 
  private handleDisconnect(connection: ClientConnection): void {
    this.connections.delete(connection.id);
    this.removeUserConnection(connection.userId, connection.id);
  }
 
  private addUserConnection(userId: string, connId: string): void {
    if (!this.userConnections.has(userId)) {
      this.userConnections.set(userId, new Set());
    }
    this.userConnections.get(userId)!.add(connId);
  }
 
  private removeUserConnection(userId: string, connId: string): void {
    this.userConnections.get(userId)?.delete(connId);
    if (this.userConnections.get(userId)?.size === 0) {
      this.userConnections.delete(userId);
    }
  }
 
  private extractToken(request: IncomingMessage): string | null {
    const url = new URL(request.url ?? "", "ws://localhost");
    return url.searchParams.get("token");
  }
 
  private async authenticateToken(
    token: string
  ): Promise<{ id: string } | null> {
    // Verify JWT or session token
    return tokenService.verify(token);
  }
 
  private async deliverPendingMessages(
    userId: string,
    connection: ClientConnection
  ): Promise<void> {
    const pending = await messageQueue.getPending(userId);
    for (const message of pending) {
      this.send(connection, message);
      await messageQueue.acknowledge(message.id);
    }
  }
}

Heartbeat and Connection Health

WebSocket connections can silently die — the client loses network connectivity, a proxy times out, or a load balancer closes an idle connection. Without active health checking, the server holds stale connections that consume memory and produce delivery failures.

class HeartbeatManager {
  private interval: NodeJS.Timeout | null = null;
  private pingIntervalMs: number;
  private timeoutMs: number;
 
  constructor(pingIntervalMs: number = 30000, timeoutMs: number = 10000) {
    this.pingIntervalMs = pingIntervalMs;
    this.timeoutMs = timeoutMs;
  }
 
  start(
    connections: Map<string, ClientConnection>,
    onTimeout: (connection: ClientConnection) => void
  ): void {
    this.interval = setInterval(() => {
      const now = Date.now();
 
      for (const [id, connection] of connections) {
        const timeSinceLastPing = now - connection.lastPingAt.getTime();
 
        if (timeSinceLastPing > this.pingIntervalMs + this.timeoutMs) {
          // Connection has not responded to ping — terminate
          onTimeout(connection);
          continue;
        }
 
        if (timeSinceLastPing > this.pingIntervalMs) {
          // Time to send a ping
          if (connection.socket.readyState === WebSocket.OPEN) {
            connection.socket.ping();
          }
        }
      }
    }, this.pingIntervalMs / 2);
  }
 
  stop(): void {
    if (this.interval) {
      clearInterval(this.interval);
    }
  }
}

The client mirrors this with its own heartbeat handler, responding to server pings and detecting when the connection has been lost.

Message Protocol

Whisper uses a typed message protocol that distinguishes between different message categories and supports acknowledgment.

interface WhisperMessage {
  id: string;
  type: MessageType;
  channel?: string;
  payload: unknown;
  timestamp: string;
  requiresAck: boolean;
}
 
type MessageType =
  | "notification"      // Push notification content
  | "transaction_alert" // Real-time transaction event
  | "chat_message"      // In-app chat message
  | "system_event"      // System status updates
  | "presence"          // User online/offline status
  | "ack"               // Acknowledgment
  | "subscribe"         // Subscribe to a channel
  | "unsubscribe";      // Unsubscribe from a channel
 
class MessageRouter {
  private handlers = new Map<
    MessageType,
    (conn: ClientConnection, msg: WhisperMessage) => Promise<void>
  >();
 
  register(
    type: MessageType,
    handler: (conn: ClientConnection, msg: WhisperMessage) => Promise<void>
  ): void {
    this.handlers.set(type, handler);
  }
 
  async route(
    connection: ClientConnection,
    raw: string
  ): Promise<void> {
    const message: WhisperMessage = JSON.parse(raw);
    const handler = this.handlers.get(message.type);
 
    if (!handler) {
      console.warn(`No handler for message type: ${message.type}`);
      return;
    }
 
    await handler(connection, message);
 
    if (message.requiresAck) {
      connection.socket.send(
        JSON.stringify({
          id: crypto.randomUUID(),
          type: "ack",
          payload: { messageId: message.id },
          timestamp: new Date().toISOString(),
          requiresAck: false,
        })
      );
    }
  }
}

Scaling with Multiple Server Instances

A single WebSocket server has a limit on the number of concurrent connections it can handle — typically 50,000-100,000 depending on hardware and message volume. Scaling beyond that requires multiple server instances with a coordination layer.

import { Redis } from "ioredis";
 
class DistributedWhisperServer {
  private localConnections = new Map<string, ClientConnection>();
  private redis: Redis;
  private pubsub: Redis;
  private serverId: string;
 
  constructor(redis: Redis) {
    this.redis = redis;
    this.pubsub = redis.duplicate();
    this.serverId = crypto.randomUUID();
  }
 
  async start(): Promise<void> {
    // Subscribe to cross-server messages
    await this.pubsub.subscribe("whisper:broadcast");
    this.pubsub.on("message", (channel, data) => {
      const message = JSON.parse(data);
      if (message.sourceServer === this.serverId) return;
      this.deliverLocally(message.userId, message.payload);
    });
  }
 
  async sendToUser(userId: string, payload: unknown): Promise<boolean> {
    // Try local delivery first
    if (this.deliverLocally(userId, payload)) {
      return true;
    }
 
    // User might be on another server — publish to Redis
    await this.redis.publish(
      "whisper:broadcast",
      JSON.stringify({
        sourceServer: this.serverId,
        userId,
        payload,
      })
    );
 
    return true;
  }
 
  private deliverLocally(userId: string, payload: unknown): boolean {
    // Find all local connections for this user
    let delivered = false;
    for (const [, conn] of this.localConnections) {
      if (conn.userId === userId && conn.socket.readyState === WebSocket.OPEN) {
        conn.socket.send(JSON.stringify(payload));
        delivered = true;
      }
    }
    return delivered;
  }
 
  async registerConnection(userId: string, connId: string): Promise<void> {
    // Track which server handles which user
    await this.redis.sadd(`whisper:user:${userId}:servers`, this.serverId);
    await this.redis.expire(`whisper:user:${userId}:servers`, 3600);
  }
}

Conclusion

WebSocket architecture for real-time messaging requires attention to every phase of the connection lifecycle — authentication, health monitoring, message routing, and graceful termination. Whisper's architecture handles these concerns through typed message protocols, server-side heartbeat monitoring, acknowledgment-based delivery guarantees, and Redis-coordinated multi-server scaling. The result is a real-time channel that delivers notifications, transaction alerts, and chat messages to Klivvr's users with sub-second latency and reliable delivery.

Related Articles

business

Omnichannel Messaging Strategy for Fintech

How to build a unified omnichannel messaging strategy across push, SMS, email, and in-app channels — covering channel selection, message consistency, and unified customer experience.

7 min read
business

Real-Time Notifications in Banking

How real-time notifications transform the banking experience, covering transaction alerts, security notifications, compliance requirements, and the business impact of instant communication.

6 min read
technical

Scaling Notification Systems

Strategies for scaling notification systems to millions of users, covering horizontal scaling, fan-out patterns, rate limiting, and infrastructure design — with patterns from Whisper.

5 min read