Scaling and concurrency controls

Serverless Applications with AWS Lambda

Claudio Canales

Senior DevOps Engineer

Scaling in one picture

  • More demand -> more parallel runs.
  • Each run uses an execution environment.
  • Concurrency = runs happening now.

Demand spike -> more environments

Serverless Applications with AWS Lambda

What is concurrency?

  • One invocation = one unit of work.
  • Concurrency = how many are running now.
  • If 10 are running, concurrency = 10.

Parallel invocations

Serverless Applications with AWS Lambda

Estimating required concurrency

  • Rule of thumb: concurrency ≈ rps * duration.
  • Reduce duration to reduce concurrency.
  • Use this to size limits safely.

Concurrency formula

Serverless Applications with AWS Lambda

Example: 50 rps at 200 ms

  • 200 ms = 0.2 seconds.
  • 50 * 0.2 ≈ 10 concurrent executions.
  • Faster code means fewer parallel runs.

Example concurrency calculation

Serverless Applications with AWS Lambda

Limits: account pool vs function slice

  • Concurrency is limited at the account level.
  • Reserve part of the pool for a critical function.
  • Other functions share what's left.

Account concurrency pool

Serverless Applications with AWS Lambda

Reserved Concurrency: a hard cap

  • A safety valve.
  • Limits parallel work.
  • Above the cap, invocations throttle.
  • A noisy function can't exhaust capacity.

Reserved concurrency cap

Serverless Applications with AWS Lambda

Throttling: what it looks like

  • Lambda throttles at the concurrency limit.
  • The caller doesn't get normal execution.
  • You see throttles in monitoring.

Throttling signals

Serverless Applications with AWS Lambda

Provisioned Concurrency: warm capacity

  • A pool of pre-initialized environments.
  • Requests start faster.
  • Often behind an alias.
  • You pay to keep them ready.

Provisioned concurrency warm pool

Serverless Applications with AWS Lambda

Cold start vs provisioned

Cold start

  • A one-time initialization delay.
  • First request pays the setup cost.

Provisioned

  • Init work done in advance.
  • First request doesn't pay the setup cost.

Cold vs provisioned timeline

Serverless Applications with AWS Lambda

Reserved vs provisioned (different problems)

  • Reserved controls load.
  • Provisioned controls startup latency.
  • Many production functions use both.

Reserved vs provisioned comparison

Serverless Applications with AWS Lambda

Protect downstream systems

  • Database handles only so many connections.
  • A traffic spike can turn into an outage.
  • A concurrency cap prevents it.

Concurrency cap protects a database

Serverless Applications with AWS Lambda

Bursts vs steady traffic

  • Bursts create many parallel environments.
  • Slow handlers keep concurrency high.
  • Caps can smooth the spike.
  • Improving duration is also a scaling strategy.

Burst traffic vs steady state

Serverless Applications with AWS Lambda

Failures and retries increase load

  • Retries can amplify traffic.
  • A failure loop can overwhelm dependencies.
  • Concurrency limits reduce blast radius.

Retries multiply load

Serverless Applications with AWS Lambda

A simple tuning workflow

  • Start with measurement.
  • Cap concurrency when throttles or pressure appear.
  • Add provisioned capacity when cold starts hurt latency.

Measure -> cap -> pre-warm

Serverless Applications with AWS Lambda

Key takeaways

  • Concurrency helps you think about scaling.
  • Estimate: rps * duration.
  • Reserved caps load; watch for throttles.
  • Provisioned reduces cold-start latency.

Concurrency key takeaways

Serverless Applications with AWS Lambda

Let's practice!

Serverless Applications with AWS Lambda

Preparing Video For Download...