Autoscaling Configuration Overview
Implement autoscaling strategies to automatically adjust resource capacity based on demand, ensuring cost efficiency while maintaining performance and availability.
When to Use Traffic-driven workload scaling Time-based scheduled scaling Resource utilization optimization Cost reduction High-traffic event handling Batch processing optimization Database connection pooling Implementation Examples 1. Kubernetes Horizontal Pod Autoscaler
hpa-configuration.yaml
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 - type: Pods value: 2 periodSeconds: 60 selectPolicy: Min scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max
Vertical Pod Autoscaler for resource optimization
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: myapp-vpa namespace: production spec: targetRef: apiVersion: apps/v1 kind: Deployment name: myapp updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: myapp minAllowed: cpu: 50m memory: 64Mi maxAllowed: cpu: 1000m memory: 512Mi controlledResources: - cpu - memory
- AWS Auto Scaling
aws-autoscaling.yaml
apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-config namespace: production data: setup-asg.sh: | #!/bin/bash set -euo pipefail
ASG_NAME="myapp-asg"
MIN_SIZE=2
MAX_SIZE=10
DESIRED_CAPACITY=3
TARGET_CPU=70
TARGET_MEMORY=80
echo "Creating Auto Scaling Group..."
# Create launch template
aws ec2 create-launch-template \
--launch-template-name myapp-template \
--version-description "Production version" \
--launch-template-data '{
"ImageId": "ami-0c55b159cbfafe1f0",
"InstanceType": "t3.medium",
"KeyName": "myapp-key",
"SecurityGroupIds": ["sg-0123456789abcdef0"],
"UserData": "#!/bin/bash\ncd /app && docker-compose up -d",
"TagSpecifications": [{
"ResourceType": "instance",
"Tags": [{"Key": "Name", "Value": "myapp-instance"}]
}]
}' || true
# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name "$ASG_NAME" \
--launch-template LaunchTemplateName=myapp-template \
--min-size $MIN_SIZE \
--max-size $MAX_SIZE \
--desired-capacity $DESIRED_CAPACITY \
--availability-zones us-east-1a us-east-1b us-east-1c \
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \
--health-check-type ELB \
--health-check-grace-period 300 \
--tags "Key=Name,Value=myapp,PropagateAtLaunch=true"
# Create CPU scaling policy
aws autoscaling put-scaling-policy \
--auto-scaling-group-name "$ASG_NAME" \
--policy-name myapp-cpu-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"TargetValue": '$TARGET_CPU',
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 300
}'
echo "Auto Scaling Group created: $ASG_NAME"
apiVersion: batch/v1 kind: CronJob metadata: name: scheduled-autoscaling namespace: production spec: # Scale up at 8 AM - schedule: "0 8 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity \ --auto-scaling-group-name myapp-asg \ --desired-capacity 10 restartPolicy: OnFailure
# Scale down at 6 PM - schedule: "0 18 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity \ --auto-scaling-group-name myapp-asg \ --desired-capacity 3 restartPolicy: OnFailure
- Custom Metrics Autoscaling
custom-metrics-hpa.yaml
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: custom-metrics-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 1 maxReplicas: 50 metrics: # Queue depth from custom metrics - type: Pods pods: metric: name: job_queue_depth target: type: AverageValue averageValue: "100"
# Request rate from custom metrics
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
# Custom business metric
- type: Pods
pods:
metric:
name: active_connections
target:
type: AverageValue
averageValue: "500"
Prometheus ServiceMonitor for custom metrics
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: myapp-metrics namespace: production spec: selector: matchLabels: app: myapp endpoints: - port: metrics interval: 30s path: /metrics
- Autoscaling Script
!/bin/bash
autoscaling-setup.sh - Complete autoscaling configuration
set -euo pipefail
ENVIRONMENT="${1:-production}" DEPLOYMENT="${2:-myapp}"
echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT"
Create HPA
cat <<EOF | kubectl apply -f - apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: ${DEPLOYMENT}-hpa namespace: ${ENVIRONMENT} spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ${DEPLOYMENT} minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 EOF
echo "HPA created successfully"
Monitor autoscaling
echo "Monitoring autoscaling events..." kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w
- Monitoring Autoscaling
autoscaling-monitoring.yaml
apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-alerts namespace: monitoring data: alerts.yaml: | groups: - name: autoscaling rules: - alert: HpaMaxedOut expr: | kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas and kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas for: 10m labels: severity: warning annotations: summary: "HPA {{ $labels.hpa }} is at maximum replicas"
- alert: HpaMinedOut
expr: |
kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
and
kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas
for: 30m
labels:
severity: info
annotations:
summary: "HPA {{ $labels.hpa }} is at minimum replicas"
- alert: AsgCapacityLow
expr: |
aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2
for: 10m
labels:
severity: warning
annotations:
summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"
Best Practices ✅ DO Set appropriate min/max replicas Monitor metric aggregation window Implement cooldown periods Use multiple metrics Test scaling behavior Monitor scaling events Plan for peak loads Implement fallback strategies ❌ DON'T Set min replicas to 1 Scale too aggressively Ignore cooldown periods Use single metric only Forget to test scaling Scale below resource needs Neglect monitoring Deploy without capacity tests Scaling Metrics CPU Utilization: Most common metric Memory Utilization: Heap-bound applications Request Rate: API-driven scaling Queue Depth: Async job processing Custom Metrics: Business-specific indicators Resources Kubernetes HPA Documentation AWS Auto Scaling KEDA - Event Scaling Vertical Pod Autoscaler