Ensuring Data Quality in Mobile Analytics

Establish data quality practices for mobile analytics, including validation, monitoring, testing, and governance to maintain trustworthy analytics data.

business7 min readBy Klivvr Engineering
Share:

Bad analytics data is worse than no analytics data. When data is absent, teams know they are guessing. When data is present but inaccurate, teams make confident decisions based on lies. Data quality issues in mobile analytics are insidious -- they accumulate silently over months until someone discovers that the conversion rate has been double-counted since a refactor three releases ago.

This article covers how to establish and maintain data quality in mobile analytics using KlivvrAnalyticsKit, from automated validation to monitoring and governance processes.

The Dimensions of Data Quality

Data quality is not a single concern. It spans several dimensions, each requiring different strategies to maintain.

// Data quality dimensions modeled in code
struct DataQualityAssessment {
    let completeness: Double    // Are all expected properties present?
    let accuracy: Double        // Do values match reality?
    let consistency: Double     // Are naming conventions followed?
    let timeliness: Double      // Is data arriving within expected latency?
    let uniqueness: Double      // Are there duplicate events?
    let validity: Double        // Do values fall within expected ranges?
 
    var overallScore: Double {
        let weights: [Double] = [0.20, 0.25, 0.15, 0.10, 0.15, 0.15]
        let scores = [completeness, accuracy, consistency, timeliness, uniqueness, validity]
        return zip(weights, scores).map(*).reduce(0, +)
    }
 
    var grade: String {
        switch overallScore {
        case 0.95...: return "A"
        case 0.90..<0.95: return "B"
        case 0.80..<0.90: return "C"
        case 0.70..<0.80: return "D"
        default: return "F"
        }
    }
}

Completeness asks whether all expected events and properties are present. Accuracy asks whether the values are correct. Consistency checks naming conventions and schemas. Timeliness measures latency. Uniqueness catches duplicates. Validity ensures values fall within expected ranges.

Automated Validation at the SDK Level

The best place to catch data quality issues is before the event ever leaves the device. SDK-level validation prevents malformed events from entering the pipeline.

// Comprehensive event validator
final class EventQualityValidator: EventValidator {
    private let trackingPlan: TrackingPlan
    private let qualityMonitor: QualityMonitor
 
    func validate(_ event: AnalyticsEvent) throws {
        var issues: [QualityIssue] = []
 
        // 1. Event name validation
        if event.name.isEmpty {
            issues.append(.critical("Empty event name"))
        } else if event.name.range(of: "^[a-z][a-z0-9_]{2,50}$", options: .regularExpression) == nil {
            issues.append(.warning("Event name '\(event.name)' doesn't match naming convention"))
        }
 
        // 2. Required properties check
        if let schema = trackingPlan.schema(for: event.name) {
            for required in schema.requiredProperties {
                if event.properties[required.name] == nil {
                    issues.append(.critical("Missing required property '\(required.name)' on '\(event.name)'"))
                }
            }
 
            // 3. Property type validation
            for (key, value) in event.properties {
                if let expectedType = schema.propertyType(for: key) {
                    if !matchesType(value: value, expected: expectedType) {
                        issues.append(.warning(
                            "Property '\(key)' has type \(type(of: value)) but expected \(expectedType)"
                        ))
                    }
                }
            }
 
            // 4. Property value range validation
            for rule in schema.validationRules {
                if let value = event.properties[rule.property] {
                    if !rule.validate(value) {
                        issues.append(.warning("Property '\(rule.property)' value '\(value)' out of range"))
                    }
                }
            }
        } else {
            issues.append(.info("Event '\(event.name)' not found in tracking plan"))
        }
 
        // 5. Timestamp sanity check
        let age = abs(event.timestamp.timeIntervalSinceNow)
        if age > 86400 * 7 { // Event older than 7 days
            issues.append(.warning("Event timestamp is \(Int(age / 86400)) days old"))
        }
 
        // Report issues to quality monitor
        qualityMonitor.report(issues: issues, for: event)
 
        // Only throw for critical issues
        let criticalIssues = issues.filter { $0.severity == .critical }
        if !criticalIssues.isEmpty {
            throw ValidationError.criticalIssues(criticalIssues.map(\.message))
        }
    }
 
    private func matchesType(value: Any, expected: PropertyType) -> Bool {
        switch expected {
        case .string: return value is String
        case .number: return value is Int || value is Double || value is Float
        case .boolean: return value is Bool
        case .array: return value is [Any]
        default: return true
        }
    }
}
 
struct QualityIssue {
    enum Severity { case info, warning, critical }
    let severity: Severity
    let message: String
 
    static func critical(_ message: String) -> QualityIssue {
        QualityIssue(severity: .critical, message: message)
    }
    static func warning(_ message: String) -> QualityIssue {
        QualityIssue(severity: .warning, message: message)
    }
    static func info(_ message: String) -> QualityIssue {
        QualityIssue(severity: .info, message: message)
    }
}

Automated Testing for Analytics

Analytics code needs tests just like feature code. Without tests, tracking regressions slip through code review because they do not cause visible UI changes.

// Analytics test infrastructure
final class AnalyticsTestHelper {
    private(set) var trackedEvents: [AnalyticsEvent] = []
    private(set) var identifyCalls: [(userId: String, traits: [String: Any])] = []
 
    func reset() {
        trackedEvents.removeAll()
        identifyCalls.removeAll()
    }
 
    var lastEvent: AnalyticsEvent? { trackedEvents.last }
 
    func events(named name: String) -> [AnalyticsEvent] {
        trackedEvents.filter { $0.name == name }
    }
 
    func assertEventTracked(
        _ name: String,
        withProperties properties: [String: Any] = [:],
        file: StaticString = #file,
        line: UInt = #line
    ) {
        let matching = events(named: name)
        XCTAssertFalse(matching.isEmpty, "Expected event '\(name)' was not tracked", file: file, line: line)
 
        if !properties.isEmpty {
            let hasMatch = matching.contains { event in
                properties.allSatisfy { key, expectedValue in
                    guard let actualValue = event.properties[key] else { return false }
                    return "\(actualValue)" == "\(expectedValue)"
                }
            }
            XCTAssertTrue(
                hasMatch,
                "Event '\(name)' tracked but properties don't match. Expected: \(properties)",
                file: file,
                line: line
            )
        }
    }
 
    func assertEventNotTracked(_ name: String, file: StaticString = #file, line: UInt = #line) {
        let matching = events(named: name)
        XCTAssertTrue(matching.isEmpty, "Event '\(name)' should not have been tracked", file: file, line: line)
    }
}
 
// Example test
class PurchaseFlowAnalyticsTests: XCTestCase {
    var analyticsHelper: AnalyticsTestHelper!
    var viewModel: PurchaseViewModel!
 
    override func setUp() {
        super.setUp()
        analyticsHelper = AnalyticsTestHelper()
        KlivvrAnalytics.shared.setTestDestination(analyticsHelper)
        viewModel = PurchaseViewModel()
    }
 
    func testPurchaseCompletedEventTracked() {
        // Given
        let product = Product(id: "SKU123", name: "Widget", price: 29.99)
 
        // When
        viewModel.completePurchase(product: product)
 
        // Then
        analyticsHelper.assertEventTracked("purchase_completed", withProperties: [
            "product_id": "SKU123",
            "price": 29.99,
            "currency": "USD"
        ])
    }
 
    func testPurchaseFailureEventIncludesErrorCode() {
        // Given
        let error = PurchaseError.paymentDeclined(code: "card_declined")
 
        // When
        viewModel.handlePurchaseFailure(error)
 
        // Then
        analyticsHelper.assertEventTracked("purchase_failed", withProperties: [
            "error_code": "card_declined"
        ])
    }
 
    func testNoEventsTrackedWithoutConsent() {
        // Given
        ConsentManager.shared.revokeAllConsent()
 
        // When
        viewModel.completePurchase(product: Product(id: "SKU123", name: "Widget", price: 29.99))
 
        // Then
        analyticsHelper.assertEventNotTracked("purchase_completed")
    }
}

Data Quality Monitoring in Production

Validation and testing catch issues before they ship. Monitoring catches issues that slip through.

// Quality monitoring tracks quality metrics over time
final class QualityMonitor {
    private var issueCounters: [String: Int] = [:]
    private var eventCounts: [String: Int] = [:]
    private let reportingInterval: TimeInterval = 3600 // Hourly
 
    func report(issues: [QualityIssue], for event: AnalyticsEvent) {
        eventCounts[event.name, default: 0] += 1
 
        for issue in issues {
            let key = "\(issue.severity):\(event.name)"
            issueCounters[key, default: 0] += 1
        }
    }
 
    func generateQualityReport() -> QualityReport {
        let totalEvents = eventCounts.values.reduce(0, +)
        let totalIssues = issueCounters.values.reduce(0, +)
 
        let issueRate = totalEvents > 0 ? Double(totalIssues) / Double(totalEvents) : 0
 
        return QualityReport(
            totalEvents: totalEvents,
            totalIssues: totalIssues,
            issueRate: issueRate,
            issuesByEvent: issueCounters,
            eventVolumes: eventCounts
        )
    }
 
    struct QualityReport {
        let totalEvents: Int
        let totalIssues: Int
        let issueRate: Double
        let issuesByEvent: [String: Int]
        let eventVolumes: [String: Int]
 
        var topIssues: [(String, Int)] {
            issuesByEvent.sorted { $0.value > $1.value }.prefix(10).map { ($0.key, $0.value) }
        }
    }
}

Set up alerts for anomalies: sudden drops in event volume (tracking broke), spikes in validation errors (schema changed without updating tracking), and increases in duplicate events (retry logic misbehaving).

Governance and Process

Technical solutions only work when supported by organizational processes. Data quality governance establishes who is responsible, what the standards are, and how violations are addressed.

Every analytics event should have an owner -- typically the product manager for the feature area. Changes to event schemas require tracking plan updates before the code merges. Quarterly data quality audits compare the tracking plan against actual production data to identify drift. New team members should receive analytics onboarding that covers naming conventions, the tracking plan, and the testing requirements.

Practical Tips

Treat your tracking plan as a living document, not a write-once artifact. Review it quarterly and prune events nobody queries. Use debug mode during development that logs validation warnings to the console so developers see issues immediately. Set up CI checks that fail if analytics tests break. Build a data quality score and review it weekly with the same rigor you apply to uptime or crash rates. When you find a data quality issue, trace it back to the root cause and fix the process, not just the data.

Conclusion

Data quality is an ongoing practice, not a one-time project. By combining SDK-level validation, automated testing, production monitoring, and governance processes, you build a system where data quality issues are caught early, tracked transparently, and resolved systematically. KlivvrAnalyticsKit provides the foundational validation and monitoring hooks that make this possible, but lasting data quality requires the organizational commitment to maintain standards over time.

Related Articles

business

Unified Analytics Strategy Across Platforms

Develop a cohesive analytics strategy that spans iOS, Android, web, and backend platforms, ensuring consistent measurement and cross-platform user journeys.

7 min read