Building an Analytics SDK: Architecture Decisions

Explore the key architectural decisions behind building a robust analytics SDK for iOS, from event queuing to network strategies and modular design.

technical8 min readBy Klivvr Engineering
Share:

Building an analytics SDK is far more nuanced than simply firing HTTP requests with event data. The architecture decisions you make early on will determine the SDK's reliability, performance, and developer adoption. A poorly designed analytics SDK can drain batteries, bloat app binaries, and lose critical data points. A well-designed one becomes invisible to the host application while delivering accurate, timely insights.

In this article, we walk through the core architectural decisions behind KlivvrAnalyticsKit and share the patterns that have proven effective for building production-grade analytics SDKs in Swift.

Core Architecture: The Event Pipeline

The foundation of any analytics SDK is its event pipeline. Events flow through several stages: creation, enrichment, validation, queuing, batching, and transmission. Each stage is a discrete responsibility, and separating them cleanly is essential.

// The event pipeline as a protocol-oriented design
protocol AnalyticsEventPipeline {
    func track(_ event: AnalyticsEvent)
}
 
protocol EventEnricher {
    func enrich(_ event: inout AnalyticsEvent)
}
 
protocol EventValidator {
    func validate(_ event: AnalyticsEvent) throws
}
 
protocol EventQueue {
    func enqueue(_ event: AnalyticsEvent)
    func dequeue(batch size: Int) -> [AnalyticsEvent]
    var count: Int { get }
}
 
protocol EventTransmitter {
    func send(_ events: [AnalyticsEvent]) async throws
}
 
// The pipeline coordinator
final class AnalyticsPipelineCoordinator: AnalyticsEventPipeline {
    private let enrichers: [EventEnricher]
    private let validator: EventValidator
    private let queue: EventQueue
    private let transmitter: EventTransmitter
    private let batchScheduler: BatchScheduler
 
    init(
        enrichers: [EventEnricher],
        validator: EventValidator,
        queue: EventQueue,
        transmitter: EventTransmitter,
        batchScheduler: BatchScheduler
    ) {
        self.enrichers = enrichers
        self.validator = validator
        self.queue = queue
        self.transmitter = transmitter
        self.batchScheduler = batchScheduler
    }
 
    func track(_ event: AnalyticsEvent) {
        var mutableEvent = event
 
        // Enrichment phase
        for enricher in enrichers {
            enricher.enrich(&mutableEvent)
        }
 
        // Validation phase
        do {
            try validator.validate(mutableEvent)
        } catch {
            Logger.warning("Event validation failed: \(error)")
            return
        }
 
        // Queuing phase
        queue.enqueue(mutableEvent)
 
        // Trigger batch check
        batchScheduler.checkBatchThreshold(queueCount: queue.count)
    }
}

This pipeline approach gives us clean separation of concerns. Each component can be tested in isolation, swapped out for different implementations, and extended without touching the rest of the system.

Event Queuing and Persistence

One of the most critical decisions in analytics SDK design is how you handle event persistence. Events can be lost if they only exist in memory -- an app crash, a force-quit, or a low-memory termination can wipe out undelivered events. Persistent storage is non-negotiable for reliable analytics.

// SQLite-backed persistent event queue
final class PersistentEventQueue: EventQueue {
    private let database: SQLiteDatabase
    private let serialQueue = DispatchQueue(label: "com.klivvr.analytics.queue")
 
    init(databasePath: String) throws {
        self.database = try SQLiteDatabase(path: databasePath)
        try createTableIfNeeded()
    }
 
    private func createTableIfNeeded() throws {
        try database.execute("""
            CREATE TABLE IF NOT EXISTS analytics_events (
                id TEXT PRIMARY KEY,
                payload BLOB NOT NULL,
                created_at REAL NOT NULL,
                retry_count INTEGER DEFAULT 0
            )
        """)
    }
 
    func enqueue(_ event: AnalyticsEvent) {
        serialQueue.async { [weak self] in
            guard let self else { return }
            do {
                let data = try JSONEncoder().encode(event)
                try self.database.execute(
                    "INSERT INTO analytics_events (id, payload, created_at) VALUES (?, ?, ?)",
                    parameters: [event.id, data, Date().timeIntervalSince1970]
                )
            } catch {
                Logger.error("Failed to enqueue event: \(error)")
            }
        }
    }
 
    func dequeue(batch size: Int) -> [AnalyticsEvent] {
        serialQueue.sync {
            do {
                let rows = try database.query(
                    "SELECT id, payload FROM analytics_events ORDER BY created_at ASC LIMIT ?",
                    parameters: [size]
                )
                return rows.compactMap { row in
                    guard let data = row["payload"] as? Data else { return nil }
                    return try? JSONDecoder().decode(AnalyticsEvent.self, from: data)
                }
            } catch {
                Logger.error("Failed to dequeue events: \(error)")
                return []
            }
        }
    }
 
    var count: Int {
        serialQueue.sync {
            (try? database.scalar("SELECT COUNT(*) FROM analytics_events")) ?? 0
        }
    }
}

We chose SQLite over Core Data or flat files for several reasons. SQLite provides ACID transactions, excellent read/write performance, and a small footprint. It also handles concurrent access well when combined with a serial dispatch queue, and it survives app termination gracefully.

Batching and Transmission Strategy

Sending events one at a time wastes network resources and battery. Batching is essential, but the strategy needs to balance timeliness with efficiency. We use a multi-trigger approach: events are batched when the queue reaches a size threshold, when a time interval elapses, or when the app transitions to the background.

final class BatchScheduler {
    private let batchSize: Int
    private let flushInterval: TimeInterval
    private var timer: Timer?
    private let onFlush: () -> Void
 
    init(batchSize: Int = 20, flushInterval: TimeInterval = 30, onFlush: @escaping () -> Void) {
        self.batchSize = batchSize
        self.flushInterval = flushInterval
        self.onFlush = onFlush
        startTimer()
        observeAppLifecycle()
    }
 
    func checkBatchThreshold(queueCount: Int) {
        if queueCount >= batchSize {
            onFlush()
        }
    }
 
    private func startTimer() {
        timer = Timer.scheduledTimer(withTimeInterval: flushInterval, repeats: true) { [weak self] _ in
            self?.onFlush()
        }
    }
 
    private func observeAppLifecycle() {
        NotificationCenter.default.addObserver(
            forName: UIApplication.willResignActiveNotification,
            object: nil,
            queue: .main
        ) { [weak self] _ in
            self?.flushBeforeBackground()
        }
    }
 
    private func flushBeforeBackground() {
        let taskId = UIApplication.shared.beginBackgroundTask()
        onFlush()
        // Allow time for network request, then end task
        DispatchQueue.main.asyncAfter(deadline: .now() + 5) {
            UIApplication.shared.endBackgroundTask(taskId)
        }
    }
}

The transmission layer itself needs retry logic with exponential backoff. Network failures are common on mobile, and we cannot afford to lose events because of a transient connectivity issue.

final class HTTPEventTransmitter: EventTransmitter {
    private let endpoint: URL
    private let session: URLSession
    private let maxRetries = 3
 
    func send(_ events: [AnalyticsEvent]) async throws {
        let payload = try JSONEncoder().encode(EventBatch(events: events))
        var request = URLRequest(url: endpoint)
        request.httpMethod = "POST"
        request.httpBody = payload
        request.setValue("application/json", forHTTPHeaderField: "Content-Type")
 
        try await sendWithRetry(request: request, attempt: 0)
    }
 
    private func sendWithRetry(request: URLRequest, attempt: Int) async throws {
        do {
            let (_, response) = try await session.data(for: request)
            guard let httpResponse = response as? HTTPURLResponse,
                  (200...299).contains(httpResponse.statusCode) else {
                throw TransmissionError.serverError
            }
        } catch {
            if attempt < maxRetries {
                let delay = pow(2.0, Double(attempt))
                try await Task.sleep(nanoseconds: UInt64(delay * 1_000_000_000))
                try await sendWithRetry(request: request, attempt: attempt + 1)
            } else {
                throw error
            }
        }
    }
}

Modular Plugin Architecture

An analytics SDK needs to support multiple analytics destinations -- Firebase, Mixpanel, Amplitude, custom backends -- without hard-coding any of them. A plugin architecture solves this elegantly.

// Analytics destination protocol
protocol AnalyticsDestination {
    var name: String { get }
    func initialize(config: [String: Any])
    func track(event: AnalyticsEvent)
    func identify(user: AnalyticsUser)
    func flush()
}
 
// The main analytics manager with plugin support
final class KlivvrAnalytics {
    static let shared = KlivvrAnalytics()
 
    private var destinations: [AnalyticsDestination] = []
    private let pipeline: AnalyticsPipelineCoordinator
 
    func addDestination(_ destination: AnalyticsDestination) {
        destinations.append(destination)
    }
 
    func track(_ eventName: String, properties: [String: Any] = [:]) {
        let event = AnalyticsEvent(
            name: eventName,
            properties: properties,
            timestamp: Date()
        )
 
        // Send through pipeline for each destination
        pipeline.track(event)
 
        // Also forward to registered destinations
        for destination in destinations {
            destination.track(event: event)
        }
    }
}
 
// Example destination implementation
final class FirebaseDestination: AnalyticsDestination {
    let name = "Firebase"
 
    func initialize(config: [String: Any]) {
        // Firebase setup
    }
 
    func track(event: AnalyticsEvent) {
        Analytics.logEvent(event.name, parameters: event.properties)
    }
 
    func identify(user: AnalyticsUser) {
        Analytics.setUserID(user.id)
    }
 
    func flush() {
        // Firebase handles its own flushing
    }
}

This pattern lets integrators compose their analytics stack without forking or subclassing. Each destination is a self-contained module that can be distributed as its own Swift package.

Configuration and Initialization

The SDK initialization API is the first thing developers interact with, so it needs to be simple yet flexible. We use a builder pattern that provides sensible defaults while allowing deep customization.

struct AnalyticsConfiguration {
    var apiKey: String
    var environment: Environment = .production
    var batchSize: Int = 20
    var flushInterval: TimeInterval = 30
    var maxQueueSize: Int = 1000
    var enableDebugLogging: Bool = false
    var destinations: [AnalyticsDestination] = []
 
    enum Environment {
        case development
        case staging
        case production
    }
}
 
extension KlivvrAnalytics {
    func configure(with config: AnalyticsConfiguration) {
        // Apply configuration
        // Initialize pipeline components
        // Register destinations
        // Start batch scheduler
    }
}
 
// Usage at app launch
KlivvrAnalytics.shared.configure(with: AnalyticsConfiguration(
    apiKey: "your-api-key",
    environment: .production,
    batchSize: 30,
    flushInterval: 60
))

Practical Tips

When building your own analytics SDK, keep these principles in mind. First, always persist events before attempting to send them. Memory-only queues will lose data. Second, make your SDK resilient to host app crashes by flushing on app lifecycle transitions. Third, design your public API surface to be as small as possible. Developers should be able to get started with a single configure call and a track method. Fourth, instrument your SDK itself -- track delivery success rates, queue depths, and error counts so you can monitor the health of the analytics pipeline.

Conclusion

The architecture of an analytics SDK is a study in trade-offs. You balance reliability against performance, simplicity against flexibility, and timeliness against efficiency. The pipeline architecture with persistent queuing, batched transmission, and a plugin system provides a solid foundation that scales from simple event tracking to complex, multi-destination analytics strategies. By keeping each component protocol-driven and independently testable, you build an SDK that is both maintainable and adaptable to the evolving analytics landscape.

Related Articles

business

Ensuring Data Quality in Mobile Analytics

Establish data quality practices for mobile analytics, including validation, monitoring, testing, and governance to maintain trustworthy analytics data.

7 min read