Azure Compute Solutions
Florin Angelescu
Azure Cloud Architect


Applications rarely experience constant traffic:
Without scaling:


Increase or decrease the number of replicas in a deployment using kubectl scale.

Scaling a web app from two replicas to four:


Useful for:


Monitors metrics (CPU and memory usage) then automatically adjusts the number of replicas.

Usage rises above a threshold:

Demand falls:



Demand decreases:
Infrastructure matches workload needs:

Define realistic resource requests and limits

Combine Horizontal Pod Autoscaler with Cluster Autoscaler

Test scaling behavior under load

Monitor costs

Azure Compute Solutions