In this article, I will share my thoughts on using centralised versus decentralised orchestration when building event-driven serverless systems. I have over 15 years of experience in designing and developing distributed event-driven systems. I have spent years migrating from monolithic systems to microservices and now to serverless-powered systems.
Event-driven architecture can help build highly loosely coupled, fault-tolerant, scalable, and easily-extendable systems, allowing new functionality to be brought to market more quickly.
Central Orchestration in event-driven systems
Central Orchestration in event-driven systems involves a central coordinator, or orchestrator, managing the interactions between services. The orchestrator dictates the workflow, responding to events and directing services on subsequent actions.
Key Aspects:
- Centralised Control: Orchestrator dictates service interactions, ensuring a defined sequence of operations.
- Defined Workflow: Orchestrator enforces the process flow, managing the order and completion of service tasks.
- Error Handling: Centralised management of errors and recovery processes, offering a systematic approach to handle service failures.
- Monitoring and Visibility: Enhanced overview of system processes and interactions, simplifying monitoring and system state tracking.
Challenges:
- Complexity and Bottlenecks: Orchestrator can become a complex, single point of failure, potentially leading to bottlenecks.
- Tight Coupling: Services may become tightly coupled to the orchestrator, hindering scalability and flexibility.
- Scalability Concerns: The centralized nature may affect responsiveness and scalability, especially under high load or complex interactions.
In essence, while Central Orchestration offers clear process management and centralised control, it comes with challenges like potential bottlenecks, tight coupling, and scalability issues, requiring thoughtful design and robust implementation.
What is Choreography
Event-driven architectures’ natural pattern choreography enables services or business domain functions to subscribe to events streams or topics without requiring modifications to the upstream processor or to a central orchestration service.
Services react to events they are interested in, without being instructed to directly.
Benefits of Choreography
Therefore Choreography :
- Decentralisation: By operating independently, services eliminate bottlenecks and single points of failure, enhancing overall system resilience.
- Loose Coupling: Services interact through events, maintaining modularity and adaptability. This allows for scalability and flexibility without necessitating knowledge of other services’ operations.
- Flexibility and Agility: The system readily adapts to changes as services independently manage their interactions based on events, facilitating seamless evolution and modification.
- Reduced Cross-Coordination and Dependencies: Development teams can work more autonomously with minimal inter-service dependencies, streamlining the development process.
- Decreased Development Effort and Waiting Time: The need for less coordination between teams accelerates development cycles, reducing overall effort and minimising delays.
- Faster Time to Market: Enhanced agility and increased team velocity shorten the journey from development to deployment, bringing products and services to market more swiftly.
- Improved Traffic Handling: The architecture allows individual services to scale independently and process events at their own pace, enhancing the system’s capacity to manage unpredictable traffic volumes efficiently.
Disadvantages of Choreography
- Complexity in Monitoring and Observability: The increase in single-responsibility functions and lack of a centralized control point complicate system monitoring and understanding, requiring sophisticated monitoring tools.
- Challenges in Documentation and Knowledge Management: Maintaining up-to-date architectural documentation is complex due to the dynamic and decentralized nature of the system, posing challenges in fully grasping interactions and system behavior.
- Risk of Inconsistency: Ensuring data consistency across services is complex without centralized control, necessitating robust mechanisms to handle potential event loss, duplication, and to perform compensatory actions.
- Demanding Event Management: The system’s reliance on a robust infrastructure for event management necessitates reliable event delivery, effective handling of failures and retries, and adept management of backpressure.
Harmonising Choreography and Orchestration in Serverless Architectures
In the evolving landscape of serverless and cell-based architectures, where the granularity of services is paramount, each component, such as an AWS Lambda function, is designed to be a self-contained, single-responsibility entity. These components are crafted to be independently deployable, testable, and replaceable, forming the building blocks of a highly modular and scalable system.
In this context, it becomes critically important to navigate the nuances between centralised orchestration and decentralized choreography, especially when constructing complex business processes with these serverless functions. Each approach has its merits and drawbacks, and understanding these is key to leveraging the full potential of serverless architectures.
Based on extensive experience in deploying serverless solutions, a nuanced strategy is often most effective. I advocate for employing a centralised orchestration service, such as Amazon Step Functions, for managing intricate workflows at the micro-level within a defined boundary or context. This orchestrator acts as the maestro, directing each function on when and how to perform its task, ensuring a coherent and predictable workflow. Furthermore, at crucial milestones or completion of specific stages, this orchestration service can publish events to designated event streams. These streams serve as conduits, broadcasting the progress or outcomes beyond the immediate boundaries of the context.
However, the architecture’s true agility and scalability come to the forefront at the macro level, where a choreography pattern shines. By setting up a system where processors, potentially spanning across various bounded contexts and business domains, can independently subscribe to and consume events from these streams, a decentralised model of interaction is established. This model fosters a loosely coupled environment, where services are not rigidly bound to each other but interact based on the flow of events, allowing for greater flexibility and resilience.
Employing this hybrid approach brings a multitude of benefits. It allows teams to retain the high level of observability and monitoring capabilities that are inherent to centralised orchestration systems. This is crucial for maintaining operational insight, debugging, and ensuring the reliability of complex distributed systems. Simultaneously, it encapsulates the advantages of an event-driven architecture and the decentralised nature of choreography. This duality empowers teams and business units to operate and deliver independently, significantly reducing dependencies and coordination overhead, and thereby accelerating the time to market for new features and services.
The pivotal insight from this dual strategy is that the decision isn’t a binary choice between orchestration and choreography. Instead, the most robust and agile systems emerge from a harmonious blend of both paradigms. By thoughtfully integrating centralized orchestration for managing detailed, context-specific workflows with decentralised choreography for broader, cross-context interactions, a balanced and dynamic system is born. This system is not just a compromise between orchestration and choreography; it’s an optimised solution that harnesses the distinctive strengths of both approaches. It creates a digital ecosystem that is cohesive, agile, and robust, poised to adapt and thrive in the fast-paced and ever-evolving technological landscape.