ecs

安装量: 49
排名: #15179

安装

npx skills add https://github.com/itsmostafa/aws-agent-skills --skill ecs

AWS ECS

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service. Run containers on AWS Fargate (serverless) or EC2 instances.

Table of Contents Core Concepts Common Patterns CLI Reference Best Practices Troubleshooting References Core Concepts Cluster

Logical grouping of tasks or services. Can contain Fargate tasks, EC2 instances, or both.

Task Definition

Blueprint for your application. Defines containers, resources, networking, and IAM roles.

Task

Running instance of a task definition. Can run standalone or as part of a service.

Service

Maintains desired count of tasks. Handles deployments, load balancing, and auto scaling.

Launch Types Type Description Use Case Fargate Serverless, pay per task Most workloads EC2 Self-managed instances GPU, Windows, specific requirements Common Patterns Create a Fargate Cluster

AWS CLI:

Create cluster

aws ecs create-cluster --cluster-name my-cluster

With capacity providers

aws ecs create-cluster \ --cluster-name my-cluster \ --capacity-providers FARGATE FARGATE_SPOT \ --default-capacity-provider-strategy \ capacityProvider=FARGATE,weight=1 \ capacityProvider=FARGATE_SPOT,weight=1

Register Task Definition cat > task-definition.json << 'EOF' { "family": "web-app", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "256", "memory": "512", "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole", "taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole", "containerDefinitions": [ { "name": "web", "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest", "portMappings": [ { "containerPort": 8080, "protocol": "tcp" } ], "environment": [ {"name": "NODE_ENV", "value": "production"} ], "secrets": [ { "name": "DB_PASSWORD", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/web-app", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "ecs" } }, "healthCheck": { "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"], "interval": 30, "timeout": 5, "retries": 3, "startPeriod": 60 } } ] } EOF

aws ecs register-task-definition --cli-input-json file://task-definition.json

Create Service with Load Balancer aws ecs create-service \ --cluster my-cluster \ --service-name web-service \ --task-definition web-app:1 \ --desired-count 2 \ --launch-type FARGATE \ --network-configuration "awsvpcConfiguration={ subnets=[subnet-12345678,subnet-87654321], securityGroups=[sg-12345678], assignPublicIp=DISABLED }" \ --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/1234567890123456,containerName=web,containerPort=8080" \ --health-check-grace-period-seconds 60

Run Standalone Task aws ecs run-task \ --cluster my-cluster \ --task-definition my-batch-job:1 \ --launch-type FARGATE \ --network-configuration "awsvpcConfiguration={ subnets=[subnet-12345678], securityGroups=[sg-12345678], assignPublicIp=ENABLED }"

Update Service (Deploy New Image)

Register new task definition with updated image

aws ecs register-task-definition --cli-input-json file://task-definition.json

Update service to use new version

aws ecs update-service \ --cluster my-cluster \ --service web-service \ --task-definition web-app:2 \ --force-new-deployment

Auto Scaling

Register scalable target

aws application-autoscaling register-scalable-target \ --service-namespace ecs \ --resource-id service/my-cluster/web-service \ --scalable-dimension ecs:service:DesiredCount \ --min-capacity 2 \ --max-capacity 10

Target tracking policy

aws application-autoscaling put-scaling-policy \ --service-namespace ecs \ --resource-id service/my-cluster/web-service \ --scalable-dimension ecs:service:DesiredCount \ --policy-name cpu-target-tracking \ --policy-type TargetTrackingScaling \ --target-tracking-scaling-policy-configuration '{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ECSServiceAverageCPUUtilization" }, "ScaleOutCooldown": 60, "ScaleInCooldown": 120 }'

CLI Reference Cluster Management Command Description aws ecs create-cluster Create cluster aws ecs describe-clusters Get cluster details aws ecs list-clusters List clusters aws ecs delete-cluster Delete cluster Task Definitions Command Description aws ecs register-task-definition Create task definition aws ecs describe-task-definition Get task definition aws ecs list-task-definitions List task definitions aws ecs deregister-task-definition Deregister version Services Command Description aws ecs create-service Create service aws ecs update-service Update service aws ecs describe-services Get service details aws ecs delete-service Delete service Tasks Command Description aws ecs run-task Run standalone task aws ecs stop-task Stop running task aws ecs describe-tasks Get task details aws ecs list-tasks List tasks Best Practices Security Use task roles for AWS API access (not access keys) Use execution roles for ECR/Secrets access Store secrets in Secrets Manager or Parameter Store Use private subnets with NAT gateway Enable CloudTrail for API auditing Performance Right-size CPU/memory — monitor and adjust Use Fargate Spot for fault-tolerant workloads (70% savings) Enable container insights for monitoring Use service discovery for internal communication Reliability Deploy across multiple AZs Configure health checks properly Set appropriate deregistration delay Use circuit breaker for deployments aws ecs update-service \ --cluster my-cluster \ --service web-service \ --deployment-configuration '{ "deploymentCircuitBreaker": { "enable": true, "rollback": true } }'

Cost Optimization Use Fargate Spot for batch workloads Right-size task resources Scale to zero when not needed Use capacity providers for mixed Fargate/Spot Troubleshooting Task Fails to Start

Check:

View stopped tasks

aws ecs describe-tasks \ --cluster my-cluster \ --tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)

Common causes:

Image not found (ECR permissions) Secrets access denied Network configuration (subnets, security groups) Resource limits exceeded Container Keeps Restarting

Debug:

Check CloudWatch logs

aws logs get-log-events \ --log-group-name /ecs/web-app \ --log-stream-name "ecs/web/abc123"

Check task details

aws ecs describe-tasks \ --cluster my-cluster \ --tasks task-arn \ --query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'

Causes:

Health check failing Application crashing Out of memory Service Stuck Deploying

Check deployment status

aws ecs describe-services \ --cluster my-cluster \ --services web-service \ --query 'services[0].deployments'

Check events

aws ecs describe-services \ --cluster my-cluster \ --services web-service \ --query 'services[0].events[:5]'

Causes:

Health check failing on new tasks Not enough capacity Target group health checks failing Cannot Pull Image from ECR

Check execution role has:

{ "Effect": "Allow", "Action": [ "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage" ], "Resource": "*" }

Also check:

VPC endpoint for ECR (if private subnet) NAT gateway (if private subnet) Security group allows HTTPS outbound References ECS Developer Guide ECS API Reference ECS CLI Reference boto3 ECS

返回排行榜