Performance and resource optimization

Deploying Applications on AWS

Dunieski Otano

Amazon Web Services Solutions Architect

Slow and expensive

  • The app feels slow during traffic spikes
  • The Lambda bill jumps the same month
  • Tuning makes it faster and cheaper

A speed-and-cost gauge showing a function that is both slow and expensive, pointing toward optimization

Deploying Applications on AWS

Concurrency for cost vs cold start


Reserved: cap and protect

  • Caps and guarantees capacity for a function
  • Cheapest control, but does nothing for cold starts

Provisioned: pay for warmth

  • Keeps instances pre-warmed, so there's no cold start
  • Costs more; use it only where latency truly matters
Deploying Applications on AWS

Right-sizing memory with the duration-cost curve

Duration-cost curve as memory increases: duration line drops then flattens, total cost forms a U-shape with the cheapest optimal memory setting highlighted

  • More memory = more CPU = shorter duration
  • Cost = memory price x duration
  • The cheapest point is often in the middle
  • Profile at several sizes, then pick
Deploying Applications on AWS

Application-level caching

  • Reuse work inside the execution context
  • A warm Lambda keeps globals between invocations
  • Cache config, clients, and hot lookups
  • ElastiCache for a shared cache across instances

Lambda execution context caching: global client object reused across warm invocations, and ElastiCache cluster sharing hot data across multiple function instances

Deploying Applications on AWS

CloudFront caching at the edge

  • CloudFront: cache responses at edge locations
  • Serve repeat requests without hitting the backend
  • Cache keys can include headers, query strings, cookies
  • Static and cacheable content scales for free

CloudFront edge caching: globe with edge locations serving cached responses to nearby users without hitting the backend, with cache key configuration showing headers and query strings

Deploying Applications on AWS

Choosing the right cache layer

Three stacked cache layers: CloudFront at the edge for static widely-shared content, in-memory per-instance cache for hot objects, and ElastiCache or DAX for shared high-frequency reads

  • Edge (CloudFront): static, widely shared content
  • Application (in-memory): per-instance hot objects
  • Data (ElastiCache/DAX): shared hot reads
  • Layer them; each cuts a different cost
Deploying Applications on AWS

Putting optimization together

  • Right-size memory from the duration-cost curve
  • Add provisioned concurrency only where latency matters
  • Cache at the edge, app, and data layers
  • Measure, change one thing, measure again

Optimization summary: duration-cost curve for memory sizing, provisioned concurrency decision, three cache layers diagram, and the measure-change-measure iteration cycle

Deploying Applications on AWS

Let's practice!

Deploying Applications on AWS

Preparing Video For Download...