Workflow Orchestration Patterns for Distributed Systems
A deep dive into the core orchestration patterns that power reliable distributed workflows, from sequential pipelines to parallel fan-out strategies in TypeScript.
Building reliable distributed systems is one of the hardest problems in modern software engineering. When a single business operation spans multiple services, databases, and external APIs, the question is not whether something will fail, but when. Workflow orchestration patterns provide the structural backbone that keeps these complex processes on track, even in the face of partial failures, network partitions, and service outages.
In this article, we explore the foundational orchestration patterns that Alfred uses to manage multi-step business processes. Whether you are building a payment pipeline, an order fulfillment system, or a data ingestion workflow, these patterns will help you design systems that are resilient, observable, and maintainable.
Sequential Pipeline Pattern
The sequential pipeline is the simplest orchestration pattern: steps execute one after another, and the output of each step feeds into the next. Despite its simplicity, getting it right in a distributed context requires careful thought about failure boundaries and state persistence.
import { WorkflowBuilder, StepResult } from '@alfred/core';
interface OrderContext {
orderId: string;
customerId: string;
items: Array<{ sku: string; quantity: number; price: number }>;
paymentId?: string;
shipmentId?: string;
}
const orderFulfillmentWorkflow = new WorkflowBuilder<OrderContext>('order-fulfillment')
.addStep('validate-inventory', async (ctx) => {
const availability = await inventoryService.checkBulk(ctx.items);
if (!availability.allAvailable) {
return StepResult.fail('Insufficient inventory for one or more items');
}
return StepResult.success(ctx);
})
.addStep('reserve-inventory', async (ctx) => {
await inventoryService.reserveBulk(ctx.orderId, ctx.items);
return StepResult.success(ctx);
})
.addStep('process-payment', async (ctx) => {
const payment = await paymentService.charge({
customerId: ctx.customerId,
amount: ctx.items.reduce((sum, i) => sum + i.price * i.quantity, 0),
orderId: ctx.orderId,
});
return StepResult.success({ ...ctx, paymentId: payment.id });
})
.addStep('create-shipment', async (ctx) => {
const shipment = await shippingService.create({
orderId: ctx.orderId,
items: ctx.items,
});
return StepResult.success({ ...ctx, shipmentId: shipment.id });
})
.build();The key insight is that each step boundary is a persistence checkpoint. Alfred serializes the workflow context after every successful step, so if the process crashes between reserve-inventory and process-payment, it resumes from exactly where it left off. This eliminates the ghost failures that plague naive implementations where a payment is charged but the shipment never created.
A common mistake is cramming too much logic into a single step. Each step should represent a single logical operation with a clear rollback strategy. If a step does two things, you lose the ability to compensate for partial completion within that step.
Parallel Fan-Out / Fan-In Pattern
Many real-world workflows contain steps that are independent of each other and can execute concurrently. The fan-out/fan-in pattern dispatches multiple tasks in parallel and waits for all of them to complete before proceeding.
import { WorkflowBuilder, ParallelGroup, StepResult } from '@alfred/core';
interface OnboardingContext {
userId: string;
email: string;
plan: 'starter' | 'professional' | 'enterprise';
provisioningResults?: Record<string, boolean>;
}
const customerOnboarding = new WorkflowBuilder<OnboardingContext>('customer-onboarding')
.addStep('create-account', async (ctx) => {
await accountService.create(ctx.userId, ctx.email, ctx.plan);
return StepResult.success(ctx);
})
.addParallelGroup(
new ParallelGroup<OnboardingContext>('provision-resources')
.addBranch('setup-database', async (ctx) => {
await databaseService.provisionTenant(ctx.userId, ctx.plan);
return { database: true };
})
.addBranch('setup-storage', async (ctx) => {
await storageService.createBucket(ctx.userId);
return { storage: true };
})
.addBranch('setup-messaging', async (ctx) => {
await messagingService.createQueues(ctx.userId);
return { messaging: true };
})
.addBranch('send-welcome-email', async (ctx) => {
await emailService.sendWelcome(ctx.email, ctx.plan);
return { email: true };
})
.withConcurrency(3)
.onComplete((ctx, branchResults) => {
const provisioningResults = Object.assign({}, ...branchResults);
return StepResult.success({ ...ctx, provisioningResults });
})
)
.addStep('activate-account', async (ctx) => {
await accountService.activate(ctx.userId);
return StepResult.success(ctx);
})
.build();The withConcurrency(3) call limits how many branches run simultaneously. This is critical when downstream services have rate limits or when you want to avoid overwhelming a shared resource. Alfred tracks each branch independently, so if the setup-messaging branch fails, it can be retried without re-running the branches that already succeeded.
A subtle consideration is how to handle partial failures in a parallel group. You have three strategies: fail the entire group if any branch fails (strict), continue and collect errors for later handling (lenient), or fail only if a subset of critical branches fail (selective). Alfred supports all three via the failureStrategy option on the parallel group.
Conditional Branching Pattern
Not all workflows follow a straight line. Business logic often requires branching based on runtime conditions, routing work through entirely different paths depending on the data.
import { WorkflowBuilder, ConditionalRouter, StepResult } from '@alfred/core';
interface RefundContext {
orderId: string;
reason: 'defective' | 'wrong-item' | 'changed-mind' | 'never-arrived';
amount: number;
requiresReturn?: boolean;
refundMethod?: 'original-payment' | 'store-credit';
}
const refundWorkflow = new WorkflowBuilder<RefundContext>('process-refund')
.addStep('validate-refund-request', async (ctx) => {
const order = await orderService.get(ctx.orderId);
if (!order || order.status === 'refunded') {
return StepResult.fail('Order not eligible for refund');
}
return StepResult.success(ctx);
})
.addConditional(
new ConditionalRouter<RefundContext>('determine-refund-path')
.when(
(ctx) => ctx.reason === 'defective' || ctx.reason === 'wrong-item',
new WorkflowBuilder<RefundContext>('full-refund-path')
.addStep('issue-full-refund', async (ctx) => {
await paymentService.refund(ctx.orderId, ctx.amount);
return StepResult.success({ ...ctx, refundMethod: 'original-payment' });
})
.addStep('schedule-pickup', async (ctx) => {
await shippingService.schedulePickup(ctx.orderId);
return StepResult.success({ ...ctx, requiresReturn: true });
})
.build()
)
.when(
(ctx) => ctx.reason === 'changed-mind',
new WorkflowBuilder<RefundContext>('store-credit-path')
.addStep('issue-store-credit', async (ctx) => {
await creditService.issue(ctx.orderId, ctx.amount * 0.9);
return StepResult.success({ ...ctx, refundMethod: 'store-credit' });
})
.build()
)
.otherwise(
new WorkflowBuilder<RefundContext>('investigation-path')
.addStep('open-investigation', async (ctx) => {
await supportService.openCase(ctx.orderId, ctx.reason);
return StepResult.success(ctx);
})
.build()
)
)
.addStep('notify-customer', async (ctx) => {
await notificationService.send(ctx.orderId, 'refund-processed', ctx);
return StepResult.success(ctx);
})
.build();The conditional router evaluates predicates in order and routes the workflow context into the first matching branch. The otherwise clause acts as a catch-all. After the selected branch completes, execution continues with the next step in the parent workflow, which in this case is notify-customer.
This pattern is particularly powerful because each branch is itself a full workflow, meaning it can contain its own parallel groups, sub-conditionals, and retry policies. This composability is what makes orchestration patterns scale to real-world complexity.
Event-Driven Wait Pattern
Some workflows cannot run to completion in a single burst. They need to pause and wait for an external event, such as a human approval, a webhook callback, or a scheduled time. The event-driven wait pattern suspends the workflow, persists its state, and resumes it when the expected event arrives.
import { WorkflowBuilder, WaitCondition, StepResult } from '@alfred/core';
interface LoanApplicationContext {
applicationId: string;
applicantId: string;
amount: number;
creditScore?: number;
managerApproval?: boolean;
disbursementId?: string;
}
const loanApprovalWorkflow = new WorkflowBuilder<LoanApplicationContext>('loan-approval')
.addStep('run-credit-check', async (ctx) => {
const result = await creditBureau.check(ctx.applicantId);
return StepResult.success({ ...ctx, creditScore: result.score });
})
.addStep('auto-decision', async (ctx) => {
if (ctx.creditScore && ctx.creditScore >= 750 && ctx.amount <= 50000) {
return StepResult.success({ ...ctx, managerApproval: true });
}
if (ctx.creditScore && ctx.creditScore < 500) {
return StepResult.fail('Credit score below minimum threshold');
}
return StepResult.suspend(ctx);
})
.addWait(
new WaitCondition<LoanApplicationContext>('await-manager-approval')
.forEvent('loan.approval.decision')
.withTimeout('72h')
.onTimeout(async (ctx) => {
await notificationService.send(ctx.applicantId, 'application-expired');
return StepResult.fail('Approval timed out');
})
.onEvent(async (ctx, event: { approved: boolean }) => {
return StepResult.success({ ...ctx, managerApproval: event.approved });
})
)
.addStep('disburse-funds', async (ctx) => {
if (!ctx.managerApproval) {
return StepResult.fail('Loan application denied');
}
const disbursement = await bankingService.transfer(ctx.applicantId, ctx.amount);
return StepResult.success({ ...ctx, disbursementId: disbursement.id });
})
.build();The addWait step tells Alfred to persist the workflow state and register a listener for the specified event. The workflow is completely unloaded from memory during the wait, freeing resources. When the event arrives, Alfred rehydrates the workflow context and resumes from exactly where it paused.
The timeout mechanism is essential. Without it, a workflow waiting for an event that never arrives would be stuck forever. Alfred monitors pending waits and fires the timeout handler when the deadline expires, giving you the chance to clean up, notify stakeholders, or escalate.
Practical Tips for Choosing Patterns
Selecting the right orchestration pattern is as much an art as a science. Here are guidelines drawn from production experience with Alfred.
First, start with the sequential pipeline and add complexity only when you have evidence that it is needed. Premature parallelization adds debugging overhead without measurable benefit if the parallel steps each take milliseconds. Measure your step durations before reaching for the fan-out pattern.
Second, keep your workflow context serializable. Alfred persists context to durable storage at every step boundary, so anything in the context must survive a round trip through JSON serialization. Avoid putting class instances, functions, or circular references in the context.
Third, design each step to be idempotent from the start. Because Alfred may retry a step after a transient failure, every step must produce the same result if executed twice with the same input. This is the single most important principle in workflow orchestration.
Fourth, use the event-driven wait pattern sparingly. Every waiting workflow consumes a slot in Alfred's tracking table. If you have millions of workflows waiting for events, you need a strategy for partitioning and garbage-collecting stale entries.
Conclusion
Workflow orchestration patterns are the building blocks of reliable distributed systems. The sequential pipeline gives you simplicity and clear failure boundaries. The parallel fan-out/fan-in pattern unlocks concurrency where it matters. Conditional branching keeps your workflows flexible in the face of complex business rules. And the event-driven wait pattern bridges the gap between synchronous execution and asynchronous real-world processes.
Alfred provides first-class support for all of these patterns through a composable, type-safe TypeScript API. By combining them thoughtfully, you can build workflows that handle the full complexity of real business processes while remaining testable, observable, and resilient. The patterns themselves are timeless, but the implementation details matter enormously, and getting them right is what separates a toy demo from a production-grade workflow engine.
Related Articles
Testing Complex Workflows: Strategies and Tools
A comprehensive guide to testing multi-step distributed workflows, covering unit testing individual steps, integration testing complete flows, chaos testing, and time-travel debugging.
Error Recovery Patterns in Workflow Engines
Explore the error recovery patterns used in production workflow engines, from simple retries to complex human-in-the-loop escalation strategies, with a focus on business continuity.
Business Process Automation: Strategy and Implementation
A strategic guide to automating complex business processes with workflow orchestration, covering process discovery, prioritization, and phased implementation with real-world examples.