API Monitoring and Alerting Best Practices
A comprehensive guide to monitoring API gateways in production, covering the four golden signals, structured logging, distributed tracing, and actionable alerting strategies.
An API gateway processes every request your platform handles. When it degrades, everything degrades. When it goes down, everything goes down. This central position makes monitoring and alerting not just important, but existential. You need to know within seconds when something is wrong, within minutes what is causing it, and within hours how to prevent it from happening again. This article describes the monitoring and alerting practices we have built around Dispatch, our edge-deployed API gateway, and the lessons we have learned from running it in production for fintech clients where downtime is measured in lost transactions and regulatory scrutiny.
The Four Golden Signals
Google's Site Reliability Engineering handbook defines four golden signals for monitoring any distributed system: latency, traffic, errors, and saturation. For an API gateway, these translate into specific, measurable metrics.
Latency is the time a request spends in the gateway. We measure this at three points: total request duration (client to client), gateway processing time (excluding upstream wait), and upstream response time. The distinction matters because a latency spike could indicate a problem in the gateway itself, a slow upstream service, or a network issue between the two.
import { Hono } from 'hono'
interface GatewayMetrics {
totalDuration: number
gatewayProcessing: number
upstreamDuration: number
route: string
method: string
statusCode: number
upstream: string
}
app.use('*', async (c, next) => {
const totalStart = performance.now()
let upstreamStart = 0
let upstreamEnd = 0
// Instrument upstream calls
const originalFetch = globalThis.fetch
globalThis.fetch = async (...args) => {
upstreamStart = performance.now()
const result = await originalFetch(...args)
upstreamEnd = performance.now()
return result
}
try {
await next()
} finally {
const totalEnd = performance.now()
globalThis.fetch = originalFetch
const metrics: GatewayMetrics = {
totalDuration: totalEnd - totalStart,
gatewayProcessing:
totalEnd - totalStart - (upstreamEnd - upstreamStart),
upstreamDuration: upstreamEnd - upstreamStart,
route: c.req.routePath || c.req.path,
method: c.req.method,
statusCode: c.res?.status || 500,
upstream: c.get('upstreamService') || 'unknown',
}
c.executionCtx.waitUntil(reportMetrics(metrics))
}
})Traffic is the volume of requests the gateway handles. We track this as requests per second, broken down by route, method, and client. Traffic patterns reveal usage trends, identify abuse, and provide the baseline for anomaly detection. A sudden spike in traffic to a single endpoint might indicate a DDoS attack. A gradual increase over weeks signals organic growth that may require capacity planning.
Errors encompass both HTTP error responses (4xx and 5xx) and internal gateway failures. We distinguish between client errors (validation failures, authentication failures) and server errors (upstream timeouts, gateway bugs). The error rate -- errors as a percentage of total traffic -- is a more useful metric than the error count because it normalizes for traffic volume.
Saturation in an edge-deployed gateway is less about CPU or memory (those are managed by the runtime) and more about rate limit utilization, upstream connection pool usage, and KV storage operation counts. When a client approaches their rate limit, they are close to saturation from the gateway's perspective.
Structured Logging for Edge Environments
Traditional application logging -- writing to stdout and shipping to a log aggregation service -- does not work well at the edge. Edge functions have limited execution time, and synchronous log writes would block response delivery. Dispatch uses structured, asynchronous logging that captures the information needed for debugging without impacting request latency.
interface LogEntry {
timestamp: string
level: 'debug' | 'info' | 'warn' | 'error'
message: string
requestId: string
route: string
method: string
statusCode: number
duration: number
clientIp: string
userAgent: string
userId?: string
error?: {
name: string
message: string
stack?: string
}
metadata?: Record<string, unknown>
}
class GatewayLogger {
private buffer: LogEntry[] = []
log(entry: LogEntry): void {
this.buffer.push(entry)
}
async flush(ctx: ExecutionContext): Promise<void> {
if (this.buffer.length === 0) return
const entries = [...this.buffer]
this.buffer = []
ctx.waitUntil(
fetch('https://logs.klivvr.com/ingest', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${LOG_API_KEY}`,
},
body: JSON.stringify({ entries }),
})
)
}
}
// Usage in middleware
app.use('*', async (c, next) => {
const logger = new GatewayLogger()
c.set('logger', logger)
const start = performance.now()
try {
await next()
} catch (error) {
logger.log({
timestamp: new Date().toISOString(),
level: 'error',
message: 'Unhandled error in request pipeline',
requestId: c.get('requestId') || 'unknown',
route: c.req.routePath || c.req.path,
method: c.req.method,
statusCode: 500,
duration: performance.now() - start,
clientIp: c.req.header('CF-Connecting-IP') || '',
userAgent: c.req.header('User-Agent') || '',
error: {
name: (error as Error).name,
message: (error as Error).message,
stack: (error as Error).stack,
},
})
throw error
} finally {
const duration = performance.now() - start
logger.log({
timestamp: new Date().toISOString(),
level: c.res?.status >= 500 ? 'error' : c.res?.status >= 400 ? 'warn' : 'info',
message: `${c.req.method} ${c.req.path} ${c.res?.status}`,
requestId: c.get('requestId') || 'unknown',
route: c.req.routePath || c.req.path,
method: c.req.method,
statusCode: c.res?.status || 500,
duration,
clientIp: c.req.header('CF-Connecting-IP') || '',
userAgent: c.req.header('User-Agent') || '',
userId: c.get('authenticatedUser')?.id,
})
logger.flush(c.executionCtx)
}
})Every log entry includes the request ID, which serves as the correlation key across the entire request lifecycle. When investigating an incident, an engineer can search for a request ID and see the complete chain of events: the gateway receiving the request, each middleware execution, the upstream call, and the response delivery.
Distributed Tracing Across the Gateway
Logs tell you what happened. Traces tell you how long each step took and in what order. For an API gateway that invokes multiple middleware layers and one or more upstream services, distributed tracing provides the detailed timing breakdown needed to diagnose performance issues.
Dispatch implements W3C Trace Context propagation, ensuring that traces span from the client through the gateway to upstream services:
interface TraceSpan {
traceId: string
spanId: string
parentSpanId?: string
operationName: string
startTime: number
duration: number
status: 'ok' | 'error'
tags: Record<string, string>
}
function generateSpanId(): string {
return crypto.randomUUID().replace(/-/g, '').substring(0, 16)
}
function parseTraceParent(header: string | undefined): {
traceId: string
parentSpanId: string
} | null {
if (!header) return null
const parts = header.split('-')
if (parts.length !== 4) return null
return { traceId: parts[1], parentSpanId: parts[2] }
}
app.use('*', async (c, next) => {
const incoming = parseTraceParent(c.req.header('traceparent'))
const traceId = incoming?.traceId || crypto.randomUUID().replace(/-/g, '')
const spanId = generateSpanId()
const parentSpanId = incoming?.parentSpanId
c.set('traceId', traceId)
c.set('spanId', spanId)
// Set traceparent header for upstream propagation
c.header('traceparent', `00-${traceId}-${spanId}-01`)
const spans: TraceSpan[] = []
c.set('traceSpans', spans)
const start = performance.now()
try {
await next()
spans.push({
traceId,
spanId,
parentSpanId,
operationName: `gateway.${c.req.method.toLowerCase()}.${c.req.routePath}`,
startTime: start,
duration: performance.now() - start,
status: c.res?.status < 500 ? 'ok' : 'error',
tags: {
'http.method': c.req.method,
'http.route': c.req.routePath || c.req.path,
'http.status_code': String(c.res?.status || 500),
'gateway.edge_location': c.req.header('CF-Ray')?.split('-')[1] || 'unknown',
},
})
} finally {
c.executionCtx.waitUntil(exportSpans(spans))
}
})When a performance issue is reported, the trace view shows exactly where time was spent: 2ms in authentication middleware, 1ms in rate limiting, 0.5ms in request validation, 145ms waiting for the upstream service, and 1ms in response processing. This immediately directs investigation to the upstream service rather than the gateway.
Alerting That Drives Action
The goal of alerting is to notify the right person at the right time with enough context to take action. Bad alerting -- too many alerts, vague descriptions, wrong recipients -- leads to alert fatigue, where engineers ignore notifications because they are overwhelmed with noise.
Dispatch's alerting strategy follows three principles: alert on symptoms not causes, require actionable runbooks, and escalate based on severity.
Alert on symptoms means we alert on user-facing impact, not internal metrics. We do not alert when CPU usage is high; we alert when the p95 latency exceeds the SLA threshold. We do not alert when an upstream is slow; we alert when the error rate for a specific API exceeds the baseline.
interface AlertRule {
name: string
condition: string
threshold: number
window: string
severity: 'critical' | 'warning' | 'info'
runbookUrl: string
notifyChannels: string[]
}
const alertRules: AlertRule[] = [
{
name: 'gateway_high_error_rate',
condition: 'rate(gateway_requests_total{status=~"5.."}[5m]) / rate(gateway_requests_total[5m]) > 0.05',
threshold: 0.05,
window: '5m',
severity: 'critical',
runbookUrl: 'https://runbooks.klivvr.com/gateway/high-error-rate',
notifyChannels: ['pagerduty-platform', 'slack-incidents'],
},
{
name: 'gateway_p95_latency_breach',
condition: 'histogram_quantile(0.95, gateway_request_duration_seconds[5m]) > 0.1',
threshold: 0.1,
window: '5m',
severity: 'warning',
runbookUrl: 'https://runbooks.klivvr.com/gateway/latency-breach',
notifyChannels: ['slack-platform-alerts'],
},
{
name: 'upstream_circuit_open',
condition: 'gateway_circuit_breaker_state{state="open"} > 0',
threshold: 1,
window: '1m',
severity: 'critical',
runbookUrl: 'https://runbooks.klivvr.com/gateway/circuit-breaker-open',
notifyChannels: ['pagerduty-platform', 'slack-incidents'],
},
{
name: 'rate_limit_saturation',
condition: 'rate(gateway_rate_limit_rejected_total[5m]) / rate(gateway_requests_total[5m]) > 0.1',
threshold: 0.1,
window: '5m',
severity: 'warning',
runbookUrl: 'https://runbooks.klivvr.com/gateway/rate-limit-saturation',
notifyChannels: ['slack-platform-alerts'],
},
]Require actionable runbooks means every alert links to a document that describes the symptom, likely causes, diagnostic steps, and remediation actions. When an engineer is paged at 3 AM, they should not need to reverse-engineer the system from scratch. The runbook provides a decision tree that gets them from alert to resolution as quickly as possible.
Escalate based on severity means critical alerts (service outages, data integrity issues) page on-call engineers immediately, while warnings (latency degradation, elevated error rates) post to Slack channels for next-business-day investigation. Information alerts are purely diagnostic and never notify anyone.
Health Check Endpoints and Synthetic Monitoring
Dispatch exposes multiple health check endpoints, each testing a different layer of the system:
app.get('/health', (c) => {
// Shallow health check: is the gateway responding?
return c.json({ status: 'ok', version: GATEWAY_VERSION })
})
app.get('/health/deep', async (c) => {
const checks: Record<string, { status: string; latency: number }> = {}
// Check KV connectivity
const kvStart = performance.now()
try {
await c.env.CONFIG.get('health-check-key')
checks.kv = { status: 'ok', latency: performance.now() - kvStart }
} catch (e) {
checks.kv = { status: 'error', latency: performance.now() - kvStart }
}
// Check each upstream service
const upstreamServices = ['users', 'payments', 'accounts']
for (const service of upstreamServices) {
const start = performance.now()
try {
const res = await fetch(`${getUpstreamUrl(service)}/health`, {
signal: AbortSignal.timeout(2000),
})
checks[service] = {
status: res.ok ? 'ok' : 'degraded',
latency: performance.now() - start,
}
} catch {
checks[service] = {
status: 'error',
latency: performance.now() - start,
}
}
}
const allHealthy = Object.values(checks).every((c) => c.status === 'ok')
const anyError = Object.values(checks).some((c) => c.status === 'error')
return c.json(
{
status: anyError ? 'unhealthy' : allHealthy ? 'healthy' : 'degraded',
checks,
timestamp: new Date().toISOString(),
},
anyError ? 503 : 200
)
})The shallow health check (/health) is used by load balancers and uptime monitors. It should return in under 1ms and never fail unless the gateway process itself is broken. The deep health check (/health/deep) tests connectivity to all dependencies and is used by monitoring dashboards and incident investigation tools.
Dispatch also runs synthetic monitoring -- automated requests that simulate real user flows at regular intervals. Every 30 seconds, a synthetic monitor executes the following sequence: authenticate, fetch user profile, list transactions, and initiate a test payment. Each step is timed and checked for correctness. If any step fails or exceeds latency thresholds, an alert fires before real users are affected.
Dashboards That Tell a Story
Good dashboards are organized around questions, not metrics. The Dispatch monitoring dashboard answers four questions:
"Is the gateway healthy right now?" The top row shows real-time traffic, error rate, and p95 latency, color-coded green/yellow/red against SLA thresholds.
"What changed recently?" The second row shows deployments, configuration changes, and incident markers overlaid on the traffic graph. This immediately correlates changes with impact.
"Which services are affected?" The third row breaks down error rates and latency by upstream service, making it obvious which backend is causing degradation.
"What is the user impact?" The bottom row shows business metrics: transaction completion rates, API call success rates by client type (mobile, web, internal), and geographic distribution of errors.
This structure means an on-call engineer can assess the situation in under 30 seconds by scanning the dashboard top-to-bottom, without clicking into detailed views or running queries.
Conclusion
Monitoring and alerting for an API gateway is not a set-and-forget exercise. It requires continuous refinement as traffic patterns change, new services are onboarded, and new failure modes are discovered. The practices described here -- golden signal monitoring, structured logging, distributed tracing, actionable alerting, and synthetic monitoring -- provide the foundation. But the real differentiator is the operational culture around these tools: treating alerts as precious signals rather than noise, maintaining runbooks as living documents, and investing in dashboards that serve the people who use them. In fintech, where a gateway outage can halt transactions across an entire platform, this investment is not optional. It is the cost of operating at the reliability level your users and regulators expect.
Related Articles
Edge Computing for Fintech: Latency and Compliance Benefits
How edge computing addresses the unique challenges of fintech platforms, including latency-sensitive transactions, data residency requirements, and regulatory compliance across jurisdictions.
API Performance Optimization: From 200ms to 20ms
A practical guide to optimizing API gateway performance, covering the specific techniques that took Dispatch's p95 latency from 200ms to under 20ms.
Request Validation with Zod and Hono
How to implement comprehensive request validation in Hono using Zod schemas, covering body parsing, query parameters, headers, and custom error formatting for API gateways.