Monitoring and troubleshooting AWS
John Q. Martin
Principal Consultant



Metric: CPUUtilization > 80%
Period: 5 min |
Eval Periods: 3 |
Datapoints to Alarm: 2 of 3
Period 1: 85% (breach) |
Period 2: 75% (ok) |
Period 3: 90% (breach)
Result: ALARM — 2 of 3 breached


aws cloudwatch put-metric-alarm \
--alarm-name HighCPUUtilization \
--alarm-description "Alert when CPU exceeds 80%" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--evaluation-periods 2 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:my-topic

aws cloudwatch put-composite-alarm \
--alarm-name CriticalSystemHealth \
--alarm-description "Critical when CPU and Memory both high" \
--actions-enabled \
--alarm-actions arn:aws:sns:us-east-1:123456789012:critical-alerts \
--alarm-rule "ALARM(HighCPUAlarm) AND ALARM(HighMemoryAlarm)"
--alarm-rule "(ALARM(HighErrorRate) OR ALARM(HighLatency)) \
AND NOT ALARM(MaintenanceMode)"


Example: CPU Warning at 75%, Critical at 90%, each with different SNS topics
aws cloudwatch put-metric-alarm \
--alarm-name AnomalousTraffic \
--comparison-operator LessThanLowerOrGreaterThanUpperThreshold \
--metrics '[
{"Id":"m1","MetricStat":{"Metric":{"Namespace":"AWS/ApplicationELB",
"MetricName":"RequestCount"},"Period":300,"Stat":"Average"}},
{"Id":"e1","Expression":"ANOMALY_DETECTION_BAND(m1, 2)"}
]'


<Service>-<Metric>-<Resource>-<Severity>aws cloudwatch set-alarm-state \
--alarm-name MyAlarm \
--state-value ALARM \
--state-reason "Testing alarm notification"
Monitoring and troubleshooting AWS