Performance and resource optimization

Deploying Applications on AWS

Dunieski Otano

Amazon Web Services Solutions Architect

Slow and expensive

The app feels slow during traffic spikes
The Lambda bill jumps the same month
Tuning makes it faster and cheaper

A speed-and-cost gauge showing a function that is both slow and expensive, pointing toward optimization

Concurrency for cost vs cold start

Reserved: cap and protect

Caps and guarantees capacity for a function
Cheapest control, but does nothing for cold starts

Provisioned: pay for warmth

Keeps instances pre-warmed, so there's no cold start
Costs more; use it only where latency truly matters

Right-sizing memory with the duration-cost curve

Duration-cost curve as memory increases: duration line drops then flattens, total cost forms a U-shape with the cheapest optimal memory setting highlighted

More memory = more CPU = shorter duration
Cost = memory price x duration
The cheapest point is often in the middle
Profile at several sizes, then pick

Application-level caching

Reuse work inside the execution context
A warm Lambda keeps globals between invocations
Cache config, clients, and hot lookups
ElastiCache for a shared cache across instances

Lambda execution context caching: global client object reused across warm invocations, and ElastiCache cluster sharing hot data across multiple function instances

CloudFront caching at the edge

CloudFront: cache responses at edge locations
Serve repeat requests without hitting the backend
Cache keys can include headers, query strings, cookies
Static and cacheable content scales for free

CloudFront edge caching: globe with edge locations serving cached responses to nearby users without hitting the backend, with cache key configuration showing headers and query strings

Choosing the right cache layer

Three stacked cache layers: CloudFront at the edge for static widely-shared content, in-memory per-instance cache for hot objects, and ElastiCache or DAX for shared high-frequency reads

Edge (CloudFront): static, widely shared content
Application (in-memory): per-instance hot objects
Data (ElastiCache/DAX): shared hot reads
Layer them; each cuts a different cost

Putting optimization together

Right-size memory from the duration-cost curve
Add provisioned concurrency only where latency matters
Cache at the edge, app, and data layers
Measure, change one thing, measure again

Optimization summary: duration-cost curve for memory sizing, provisioned concurrency decision, three cache layers diagram, and the measure-change-measure iteration cycle

Let's practice!

Deploying Applications on AWS