cloudwatch

安装量: 74
排名: #10538

安装

npx skills add https://github.com/itsmostafa/aws-agent-skills --skill cloudwatch
AWS CloudWatch
Amazon CloudWatch provides monitoring and observability for AWS resources and applications. It collects metrics, logs, and events, enabling you to monitor, troubleshoot, and optimize your AWS environment.
Table of Contents
Core Concepts
Common Patterns
CLI Reference
Best Practices
Troubleshooting
References
Core Concepts
Metrics
Time-ordered data points published to CloudWatch. Key components:
Namespace
Container for metrics (e.g.,
AWS/Lambda
)
Metric name
Name of the measurement (e.g.,
Invocations
)
Dimensions
Name-value pairs for filtering (e.g.,
FunctionName=MyFunc
)
Statistics
Aggregations (Sum, Average, Min, Max, SampleCount, pN)
Logs
Log data from AWS services and applications:
Log groups
Collections of log streams
Log streams
Sequences of log events from same source
Log events
Individual log entries with timestamp and message
Alarms
Automated actions based on metric thresholds:
States
OK, ALARM, INSUFFICIENT_DATA
Actions
SNS notifications, Auto Scaling, EC2 actions Common Patterns Create a Metric Alarm AWS CLI:

CPU utilization alarm for EC2

aws cloudwatch put-metric-alarm \ --alarm-name "HighCPU-i-1234567890abcdef0" \ --metric-name CPUUtilization \ --namespace AWS/EC2 \ --statistic Average \ --period 300 \ --threshold 80 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 2 \ --dimensions Name = InstanceId,Value = i-1234567890abcdef0 \ --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \ --ok-actions arn:aws:sns:us-east-1:123456789012:alerts boto3: import boto3 cloudwatch = boto3 . client ( 'cloudwatch' ) cloudwatch . put_metric_alarm ( AlarmName = 'HighCPU-i-1234567890abcdef0' , MetricName = 'CPUUtilization' , Namespace = 'AWS/EC2' , Statistic = 'Average' , Period = 300 , Threshold = 80.0 , ComparisonOperator = 'GreaterThanThreshold' , EvaluationPeriods = 2 , Dimensions = [ { 'Name' : 'InstanceId' , 'Value' : 'i-1234567890abcdef0' } ] , AlarmActions = [ 'arn:aws:sns:us-east-1:123456789012:alerts' ] , OKActions = [ 'arn:aws:sns:us-east-1:123456789012:alerts' ] ) Lambda Error Rate Alarm aws cloudwatch put-metric-alarm \ --alarm-name "LambdaErrorRate-MyFunction" \ --metrics '[ { "Id": "errors", "MetricStat": { "Metric": { "Namespace": "AWS/Lambda", "MetricName": "Errors", "Dimensions": [{"Name": "FunctionName", "Value": "MyFunction"}] }, "Period": 60, "Stat": "Sum" }, "ReturnData": false }, { "Id": "invocations", "MetricStat": { "Metric": { "Namespace": "AWS/Lambda", "MetricName": "Invocations", "Dimensions": [{"Name": "FunctionName", "Value": "MyFunction"}] }, "Period": 60, "Stat": "Sum" }, "ReturnData": false }, { "Id": "errorRate", "Expression": "errors/invocations*100", "Label": "Error Rate", "ReturnData": true } ]' \ --threshold 5 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 3 \ --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts Query Logs with Insights

Find errors in Lambda logs

aws logs start-query \ --log-group-name /aws/lambda/MyFunction \ --start-time $( date -d '1 hour ago' +%s ) \ --end-time $( date +%s ) \ --query-string ' fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 50 '

Get query results

aws logs get-query-results --query-id < query-id

boto3: import boto3 import time logs = boto3 . client ( 'logs' )

Start query

response

logs . start_query ( logGroupName = '/aws/lambda/MyFunction' , startTime = int ( time . time ( ) ) - 3600 , endTime = int ( time . time ( ) ) , queryString = ''' fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 50 ''' ) query_id = response [ 'queryId' ]

Wait for results

while True : result = logs . get_query_results ( queryId = query_id ) if result [ 'status' ] == 'Complete' : break time . sleep ( 1 ) for row in result [ 'results' ] : print ( row ) Create Metric Filter Extract metrics from log patterns:

Create metric filter for error count

aws logs put-metric-filter \ --log-group-name /aws/lambda/MyFunction \ --filter-name ErrorCount \ --filter-pattern "ERROR" \ --metric-transformations \ metricName = ErrorCount,metricNamespace = MyApp,metricValue = 1 ,defaultValue = 0 Publish Custom Metrics import boto3 cloudwatch = boto3 . client ( 'cloudwatch' ) cloudwatch . put_metric_data ( Namespace = 'MyApp' , MetricData = [ { 'MetricName' : 'OrdersProcessed' , 'Value' : 1 , 'Unit' : 'Count' , 'Dimensions' : [ { 'Name' : 'Environment' , 'Value' : 'Production' } , { 'Name' : 'OrderType' , 'Value' : 'Standard' } ] } ] ) Create Dashboard cat

dashboard.json << 'EOF' { "widgets": [ { "type": "metric", "x": 0, "y": 0, "width": 12, "height": 6, "properties": { "title": "Lambda Invocations", "metrics": [ ["AWS/Lambda", "Invocations", "FunctionName", "MyFunction"] ], "period": 60, "stat": "Sum", "region": "us-east-1" } }, { "type": "log", "x": 12, "y": 0, "width": 12, "height": 6, "properties": { "title": "Recent Errors", "query": "SOURCE '/aws/lambda/MyFunction' | filter @message like /ERROR/ | limit 20", "region": "us-east-1" } } ] } EOF aws cloudwatch put-dashboard \ --dashboard-name MyAppDashboard \ --dashboard-body file://dashboard.json CLI Reference Metrics Commands Command Description aws cloudwatch put-metric-data Publish custom metrics aws cloudwatch get-metric-data Retrieve metric values aws cloudwatch get-metric-statistics Get aggregated statistics aws cloudwatch list-metrics List available metrics Alarms Commands Command Description aws cloudwatch put-metric-alarm Create or update alarm aws cloudwatch describe-alarms List alarms aws cloudwatch set-alarm-state Manually set alarm state aws cloudwatch delete-alarms Delete alarms Logs Commands Command Description aws logs create-log-group Create log group aws logs put-log-events Write log events aws logs filter-log-events Search log events aws logs start-query Start Insights query aws logs put-metric-filter Create metric filter aws logs put-retention-policy Set log retention Best Practices Metrics Use dimensions wisely — too many creates metric explosion Aggregate before publishing — batch custom metrics Use high-resolution metrics (1-second) only when needed Set meaningful units for custom metrics Alarms Use composite alarms for complex conditions Set appropriate evaluation periods to avoid flapping Include OK actions to track recovery Use anomaly detection for dynamic thresholds Logs Set retention policies — don't keep logs forever Use structured logging (JSON) for better querying Create metric filters for key events Use Contributor Insights for top-N analysis Cost Optimization Delete unused dashboards Reduce log retention for non-critical logs Avoid high-resolution metrics unless necessary Use log subscription filters instead of polling Troubleshooting Missing Metrics Causes: Service not publishing yet (wait 1-5 minutes) Wrong namespace/dimensions Detailed monitoring not enabled (EC2) Debug:

List metrics for a namespace

aws cloudwatch list-metrics \ --namespace AWS/Lambda \ --dimensions Name = FunctionName,Value = MyFunction Alarm Stuck in INSUFFICIENT_DATA Causes: Metric not being published Dimensions mismatch Evaluation period too short Debug:

Check if metric has data

aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name Invocations \ --dimensions Name = FunctionName,Value = MyFunction \ --start-time $( date -d '1 hour ago' -u +%Y-%m-%dT%H:%M:%SZ ) \ --end-time $( date -u +%Y-%m-%dT%H:%M:%SZ ) \ --period 60 \ --statistics Sum Log Events Not Appearing Causes: IAM permissions missing CloudWatch Logs agent not running Log group doesn't exist Debug:

Check log streams

aws logs describe-log-streams \ --log-group-name /aws/lambda/MyFunction \ --order-by LastEventTime \ --descending \ --limit 5 High CloudWatch Costs Check usage:

Get PutLogEvents usage

aws cloudwatch get-metric-statistics \ --namespace AWS/Logs \ --metric-name IncomingBytes \ --dimensions Name = LogGroupName,Value = /aws/lambda/MyFunction \ --start-time $( date -d '7 days ago' -u +%Y-%m-%dT%H:%M:%SZ ) \ --end-time $( date -u +%Y-%m-%dT%H:%M:%SZ ) \ --period 86400 \ --statistics Sum References CloudWatch User Guide CloudWatch Logs User Guide CloudWatch API Reference CloudWatch CLI Reference Logs Insights Query Syntax boto3 CloudWatch

返回排行榜