Coordinating services and managing state

Developing applications on AWS

Ricardo Sueiras

Principal Technologist

Service Coordination

service coordination

Multiple services must work together to complete a workflow.
They need a coordination strategy.
Two approaches: orchestration and choreography.

Orchestration

orchestration pattern

A central orchestrator coordinates the workflow.
Orchestrator controls the sequence and decides which service runs next.
Useful for retries, conditional logic, and error handling.
On AWS: implemented with Step Functions across Lambda, ECS, DynamoDB.

Orchestration

orchestration pattern

Workflow logic lives in one place, so debugging and monitoring are easier.
Tradeoff: introduces a central dependency.

Choreography

No central controller.
Services react to events independently.
Each service listens for relevant events and does its work.
No service needs to know the full workflow.
Improves loose coupling and scalability.

choreography

Choreography

On AWS: EventBridge, SNS, SQS.
Highly scalable.
Tradeoff: harder to understand and debug as workflows grow.

choreography

Event-driven architecture

Services communicate by producing and consuming events.
Promotes loose coupling between services.
Promotes high scalability.
Enables parallel processing of events.
New services can subscribe without modifying existing ones.

event-driven-arch

Event-driven architecture

event-driven-arch

Tradeoffs when implementing EDA:
Design for eventual consistency.
Implement idempotency.
Failure handling is more complex.

Idempotency

idempotency

The same operation run multiple times produces the same result.
Critical design principle in async and event driven systems.
Distributed systems may deliver duplicate events (retries, network failures, reprocessing).
Preserves system correctness and reliability.

Dead letter queues

ltq

Manages failures in asynchronous communication.
Isolates problematic "poison" messages.
After repeated retry failures, the message moves to the DLQ.
Main processing flow keeps running.

Dead letter queues

Use when message ordering is not critical.
Skip for non critical systems where occasional data loss is acceptable.
Avoid when message ordering must be preserved.

ltq

Managing state

Managing state is fundamental to cloud native apps.
State is any data preserved between interactions.
Storing state locally on the server limits scalability and resilience.

managing state

Managing state

State management involves tradeoffs based on what you're building.
External state improves scalability and resilience.
Costs: added latency and complexity.
Things to consider: consistency, caching, and data access patterns.

managing state

Stateful

state full design

Session data is retained on the server.
Requests must route to the same instance (sticky sessions).
Limits scaling flexibility.
If the instance fails, the session is gone.

Stateless

stateless

Every request is treated independently.
No state stored on the server.
State lives in external systems.
Preferred approach for cloud native apps.
Any instance can handle any request, so horizontal scaling is straightforward.

Let's practice!

Developing applications on AWS

Preparing Video For Download...