Scaling Notification Systems

Strategies for scaling notification systems to millions of users, covering horizontal scaling, fan-out patterns, rate limiting, and infrastructure design — with patterns from Whisper.

technical5 min readBy Klivvr Engineering
Share:

A notification system that works for 10,000 users will not work for 10 million. As user base and notification volume grow, every component faces scaling pressure: the WebSocket servers that maintain persistent connections, the message queues that buffer notifications, the push service integrations that deliver to devices, and the database that tracks delivery status.

This article covers the scaling strategies Whisper uses to deliver notifications reliably at scale.

Horizontal Scaling of WebSocket Servers

WebSocket servers are stateful — each server holds a set of active connections. Scaling requires distributing connections across multiple server instances and coordinating message delivery between them.

class ScalableWhisperCluster {
  private serverId: string;
  private redis: Redis;
  private localConnections = new Map<string, WebSocket>();
 
  constructor(redis: Redis) {
    this.redis = redis;
    this.serverId = `whisper-${process.env.HOSTNAME}-${process.pid}`;
  }
 
  async registerConnection(userId: string, socket: WebSocket): Promise<void> {
    this.localConnections.set(userId, socket);
 
    // Register in Redis: which server handles which user
    await this.redis.hset(
      "whisper:connections",
      userId,
      this.serverId
    );
    await this.redis.expire("whisper:connections", 3600);
  }
 
  async sendToUser(userId: string, message: unknown): Promise<boolean> {
    // Try local delivery
    const localSocket = this.localConnections.get(userId);
    if (localSocket?.readyState === WebSocket.OPEN) {
      localSocket.send(JSON.stringify(message));
      return true;
    }
 
    // Look up which server has this user
    const targetServer = await this.redis.hget("whisper:connections", userId);
    if (!targetServer || targetServer === this.serverId) {
      return false; // User not connected
    }
 
    // Route through Redis pub/sub to the correct server
    await this.redis.publish(
      `whisper:server:${targetServer}`,
      JSON.stringify({ userId, message })
    );
 
    return true;
  }
 
  async startListening(): Promise<void> {
    const subscriber = this.redis.duplicate();
    await subscriber.subscribe(`whisper:server:${this.serverId}`);
 
    subscriber.on("message", (channel, data) => {
      const { userId, message } = JSON.parse(data);
      const socket = this.localConnections.get(userId);
      if (socket?.readyState === WebSocket.OPEN) {
        socket.send(JSON.stringify(message));
      }
    });
  }
}

Fan-Out Pattern for Broadcast Notifications

Some notifications target many users simultaneously — a system maintenance announcement, a new feature rollout, or a market-wide alert. The fan-out pattern distributes these bulk operations efficiently.

class NotificationFanOut {
  private batchSize: number;
  private concurrency: number;
  private pushService: UnifiedPushService;
 
  constructor(config: { batchSize: number; concurrency: number }) {
    this.batchSize = config.batchSize;
    this.concurrency = config.concurrency;
  }
 
  async broadcastToSegment(
    segmentQuery: string,
    template: NotificationTemplate,
    variables: Record<string, unknown>
  ): Promise<BroadcastResult> {
    const stats = { total: 0, sent: 0, failed: 0, skipped: 0 };
 
    // Stream users from database in batches
    let offset = 0;
    let batch: User[];
 
    do {
      batch = await this.getUserBatch(segmentQuery, offset, this.batchSize);
      offset += this.batchSize;
 
      // Process batch with controlled concurrency
      const results = await this.processBatch(batch, template, variables);
 
      stats.total += batch.length;
      stats.sent += results.filter((r) => r.success).length;
      stats.failed += results.filter((r) => !r.success).length;
      stats.skipped += results.filter((r) => r.skipped).length;
 
    } while (batch.length === this.batchSize);
 
    return stats;
  }
 
  private async processBatch(
    users: User[],
    template: NotificationTemplate,
    variables: Record<string, unknown>
  ): Promise<Array<{ success: boolean; skipped: boolean }>> {
    const semaphore = new Semaphore(this.concurrency);
    const results: Array<{ success: boolean; skipped: boolean }> = [];
 
    await Promise.all(
      users.map(async (user) => {
        await semaphore.acquire();
        try {
          // Check rate limit before sending
          const allowed = await rateLimiter.check(user.id);
          if (!allowed) {
            results.push({ success: false, skipped: true });
            return;
          }
 
          const rendered = templateRenderer.render(
            template,
            "push",
            user.locale,
            { ...variables, userName: user.name }
          );
 
          await this.pushService.sendToUser(user.id, {
            title: rendered.title ?? "",
            body: rendered.body,
            priority: "normal",
          });
 
          results.push({ success: true, skipped: false });
        } catch {
          results.push({ success: false, skipped: false });
        } finally {
          semaphore.release();
        }
      })
    );
 
    return results;
  }
 
  private async getUserBatch(
    query: string,
    offset: number,
    limit: number
  ): Promise<User[]> {
    const result = await db.query(
      `SELECT id, name, locale FROM users WHERE ${query} ORDER BY id LIMIT $1 OFFSET $2`,
      [limit, offset]
    );
    return result.rows;
  }
}
 
class Semaphore {
  private permits: number;
  private waiting: Array<() => void> = [];
 
  constructor(permits: number) {
    this.permits = permits;
  }
 
  async acquire(): Promise<void> {
    if (this.permits > 0) {
      this.permits--;
      return;
    }
    return new Promise((resolve) => this.waiting.push(resolve));
  }
 
  release(): void {
    if (this.waiting.length > 0) {
      this.waiting.shift()!();
    } else {
      this.permits++;
    }
  }
}

Connection Pooling and Resource Management

Each WebSocket connection consumes memory for the socket buffer, the connection state, and any subscriptions. At scale, resource management becomes critical.

class ConnectionResourceManager {
  private maxConnectionsPerServer: number;
  private currentConnections = 0;
 
  constructor(maxConnections: number = 50000) {
    this.maxConnectionsPerServer = maxConnections;
  }
 
  canAcceptConnection(): boolean {
    return this.currentConnections < this.maxConnectionsPerServer;
  }
 
  trackConnection(): void {
    this.currentConnections++;
    this.reportMetrics();
  }
 
  releaseConnection(): void {
    this.currentConnections--;
    this.reportMetrics();
  }
 
  getUtilization(): number {
    return this.currentConnections / this.maxConnectionsPerServer;
  }
 
  private reportMetrics(): void {
    metrics.gauge("whisper.connections.active", this.currentConnections);
    metrics.gauge("whisper.connections.utilization", this.getUtilization());
 
    if (this.getUtilization() > 0.8) {
      metrics.increment("whisper.connections.high_utilization_warning");
    }
  }
}

Monitoring and Alerting

At scale, visibility into system health is essential. Whisper tracks metrics at every layer.

class NotificationMetrics {
  async recordDelivery(
    channel: string,
    status: "sent" | "delivered" | "failed",
    latencyMs: number
  ): Promise<void> {
    metrics.increment(`whisper.delivery.${channel}.${status}`);
    metrics.histogram(`whisper.delivery.${channel}.latency_ms`, latencyMs);
  }
 
  async recordQueueDepth(priority: string, depth: number): Promise<void> {
    metrics.gauge(`whisper.queue.${priority}.depth`, depth);
  }
 
  async getHealthStatus(): Promise<SystemHealth> {
    const queueDepths = await this.getAllQueueDepths();
    const deliveryRate = await this.getDeliveryRate("5m");
    const errorRate = await this.getErrorRate("5m");
 
    return {
      status: errorRate > 0.05 ? "degraded" : "healthy",
      queueDepths,
      deliveryRatePerSecond: deliveryRate,
      errorRate,
      activeConnections: connectionManager.currentConnections,
      serverUtilization: connectionManager.getUtilization(),
    };
  }
}

Conclusion

Scaling a notification system requires addressing capacity at every layer: WebSocket servers scaled horizontally with Redis coordination, fan-out patterns for bulk notifications, resource management for connection limits, and comprehensive monitoring for operational visibility. Whisper's scaling architecture ensures that notification delivery remains reliable as Klivvr's user base grows — whether delivering a single transaction alert or broadcasting to millions of users simultaneously.

Related Articles

business

Omnichannel Messaging Strategy for Fintech

How to build a unified omnichannel messaging strategy across push, SMS, email, and in-app channels — covering channel selection, message consistency, and unified customer experience.

7 min read
business

Real-Time Notifications in Banking

How real-time notifications transform the banking experience, covering transaction alerts, security notifications, compliance requirements, and the business impact of instant communication.

6 min read
business

Reducing Notification Fatigue

Strategies for reducing notification fatigue in mobile apps, covering frequency capping, smart scheduling, user preference management, and content relevance optimization.

6 min read