senior-cloud-architect

安装量: 75
排名: #10436

安装

npx skills add https://github.com/borghei/claude-skills --skill senior-cloud-architect

Senior Cloud Architect

Expert-level cloud architecture and infrastructure design.

Core Competencies Multi-cloud architecture AWS, GCP, Azure platforms Cloud-native design patterns Cost optimization Security and compliance Migration strategies Disaster recovery Infrastructure automation Cloud Platform Comparison Service AWS GCP Azure Compute EC2, ECS, EKS GCE, GKE VMs, AKS Serverless Lambda Cloud Functions Azure Functions Storage S3 Cloud Storage Blob Storage Database RDS, DynamoDB Cloud SQL, Spanner SQL DB, CosmosDB ML SageMaker Vertex AI Azure ML CDN CloudFront Cloud CDN Azure CDN AWS Architecture Well-Architected Framework

Pillars:

Operational Excellence

Infrastructure as Code Monitoring and observability Incident response Continuous improvement

Security

Identity and access management Data protection Infrastructure protection Incident response

Reliability

Fault tolerance Disaster recovery Change management Failure testing

Performance Efficiency

Right-sizing resources Monitoring performance Trade-off decisions Keeping current

Cost Optimization

Cost awareness Right-sizing Reserved capacity Efficient resources

Sustainability

Region selection Efficient algorithms Hardware utilization Data management Reference Architecture ┌─────────────────────────────────────────────────────────────┐ │ Route 53 (DNS) │ └─────────────────────────────┬───────────────────────────────┘ │ ┌─────────────────────────────▼───────────────────────────────┐ │ CloudFront (CDN) │ │ WAF (Web Application Firewall) │ └─────────────────────────────┬───────────────────────────────┘ │ ┌─────────────────────────────▼───────────────────────────────┐ │ Application Load Balancer │ └──────────┬───────────────────────────────────┬──────────────┘ │ │ ┌──────────▼──────────┐ ┌──────────▼──────────┐ │ ECS/EKS Cluster │ │ ECS/EKS Cluster │ │ (AZ-a) │ │ (AZ-b) │ └──────────┬──────────┘ └──────────┬──────────┘ │ │ ┌──────────▼───────────────────────────────────▼──────────┐ │ ElastiCache (Redis) │ └─────────────────────────────┬───────────────────────────┘ │ ┌─────────────────────────────▼───────────────────────────┐ │ RDS Multi-AZ │ │ (Primary + Standby) │ └─────────────────────────────────────────────────────────┘

Terraform AWS Module module "vpc" { source = "terraform-aws-modules/vpc/aws" version = "~> 5.0"

name = "${var.project}-${var.environment}" cidr = var.vpc_cidr

azs = ["${var.region}a", "${var.region}b", "${var.region}c"] private_subnets = var.private_subnets public_subnets = var.public_subnets

enable_nat_gateway = true single_nat_gateway = var.environment != "production" enable_dns_hostnames = true enable_dns_support = true

tags = local.common_tags }

module "eks" { source = "terraform-aws-modules/eks/aws" version = "~> 19.0"

cluster_name = "${var.project}-${var.environment}" cluster_version = "1.28"

vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnets

cluster_endpoint_public_access = true cluster_endpoint_private_access = true

eks_managed_node_groups = { main = { instance_types = var.node_instance_types min_size = var.node_min_size max_size = var.node_max_size desired_size = var.node_desired_size } }

tags = local.common_tags }

module "rds" { source = "terraform-aws-modules/rds/aws" version = "~> 6.0"

identifier = "${var.project}-${var.environment}"

engine = "postgres" engine_version = "15" family = "postgres15" major_engine_version = "15" instance_class = var.db_instance_class

allocated_storage = var.db_allocated_storage max_allocated_storage = var.db_max_allocated_storage

db_name = var.db_name username = var.db_username port = 5432

multi_az = var.environment == "production" db_subnet_group_name = module.vpc.database_subnet_group vpc_security_group_ids = [module.security_group.security_group_id]

backup_retention_period = var.environment == "production" ? 30 : 7 skip_final_snapshot = var.environment != "production"

tags = local.common_tags }

Cost Optimization Reserved vs On-Demand vs Spot Type Discount Commitment Use Case On-Demand 0% None Variable workloads Reserved 30-72% 1-3 years Steady-state Savings Plans 30-72% 1-3 years Flexible compute Spot 60-90% None Fault-tolerant Cost Optimization Strategies

Right-sizing:

def analyze_utilization(instance_id: str, days: int = 14): """Analyze CPU/memory utilization for right-sizing recommendations.""" cloudwatch = boto3.client('cloudwatch')

metrics = cloudwatch.get_metric_statistics(
    Namespace='AWS/EC2',
    MetricName='CPUUtilization',
    Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
    StartTime=datetime.now() - timedelta(days=days),
    EndTime=datetime.now(),
    Period=3600,
    Statistics=['Average', 'Maximum']
)

avg_cpu = sum(p['Average'] for p in metrics['Datapoints']) / len(metrics['Datapoints'])
max_cpu = max(p['Maximum'] for p in metrics['Datapoints'])

if avg_cpu < 10 and max_cpu < 30:
    return 'downsize'
elif avg_cpu > 80:
    return 'upsize'
else:
    return 'optimal'

Cost Allocation Tags:

required_tags: - Environment: production|staging|development - Project: project-name - Owner: team-name - CostCenter: cost-center-id

automation: - Untagged resources alert after 24 hours - Auto-terminate development resources after 7 days - Weekly cost reports by tag

Cost Dashboard ┌─────────────────────────────────────────────────────────────┐ │ Monthly Cost Summary │ ├─────────────────────────────────────────────────────────────┤ │ Total: $45,231 vs Last Month: +5% │ │ │ │ By Service: By Environment: │ │ ├── EC2: $18,500 (41%) ├── Production: $38,000 │ │ ├── RDS: $12,000 (27%) ├── Staging: $4,500 │ │ ├── S3: $3,200 (7%) └── Development: $2,731 │ │ ├── Lambda: $1,800 (4%) │ │ └── Other: $9,731 (21%) Savings Opportunity: $8,200 │ │ │ │ Recommendations: │ │ • Convert 12 instances to Reserved (save $4,200/mo) │ │ • Delete 5 unused EBS volumes (save $180/mo) │ │ • Resize 8 over-provisioned instances (save $1,800/mo) │ └─────────────────────────────────────────────────────────────┘

Disaster Recovery DR Strategies Strategy RTO RPO Cost Backup & Restore Hours Hours $ Pilot Light Minutes Minutes $$ Warm Standby Minutes Seconds $$$ Multi-Site Active Seconds Near-zero $$$$ Multi-Region Architecture ┌────────────────────────────────────────────────────────────┐ │ Global Load Balancer │ │ (Route 53 / Cloud DNS) │ └──────────────┬─────────────────────────────┬───────────────┘ │ │ ┌──────────────▼──────────────┐ ┌────────────▼──────────────┐ │ Primary Region │ │ Secondary Region │ │ (us-east-1) │ │ (us-west-2) │ │ │ │ │ │ ┌──────────────────────┐ │ │ ┌──────────────────────┐ │ │ │ Application Layer │ │ │ │ Application Layer │ │ │ │ (Active) │ │ │ │ (Standby/Active) │ │ │ └──────────┬───────────┘ │ │ └──────────┬───────────┘ │ │ │ │ │ │ │ │ ┌──────────▼───────────┐ │ │ ┌──────────▼───────────┐ │ │ │ Database │──┼─┼──│ Database │ │ │ │ (Primary) │ │ │ │ (Read Replica) │ │ │ └──────────────────────┘ │ │ └──────────────────────┘ │ └────────────────────────────┘ └────────────────────────────┘ │ │ Cross-Region Replication ▼ ┌──────────────────────┐ │ S3 Backup │ │ (Multi-Region) │ └──────────────────────┘

Backup Strategy backup_policy: database: frequency: continuous retention: 35 days cross_region: true encryption: aws/rds

application_data: frequency: daily retention: 90 days versioning: enabled lifecycle: - transition_to_ia: 30 days - transition_to_glacier: 90 days - expiration: 365 days

configuration: frequency: on_change retention: unlimited storage: git + s3

Security Architecture Network Security ┌─────────────────────────────────────────────────────────────┐ │ VPC │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Public Subnet │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │ │ │ │ NAT GW │ │ ALB │ │ Bastion │ │ │ │ │ └─────────────┘ └─────────────┘ └───────────────┘ │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────────────────────────▼───────────────────────────┐ │ │ │ Private Subnet │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │ │ │ │ App Tier │ │ App Tier │ │ App Tier │ │ │ │ │ └─────────────┘ └─────────────┘ └───────────────┘ │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────────────────────────▼───────────────────────────┐ │ │ │ Data Subnet │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │ │ │ │ RDS │ │ Redis │ │ Elasticsearch│ │ │ │ │ └─────────────┘ └─────────────┘ └───────────────┘ │ │ │ └───────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘

IAM Best Practices { "Version": "2012-10-17", "Statement": [ { "Sid": "LeastPrivilegeExample", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::my-bucket/uploads/*", "Condition": { "StringEquals": { "aws:PrincipalTag/Team": "engineering" }, "IpAddress": { "aws:SourceIp": ["10.0.0.0/8"] } } } ] }

Reference Materials references/aws_patterns.md - AWS architecture patterns references/gcp_patterns.md - GCP architecture patterns references/multi_cloud.md - Multi-cloud strategies references/cost_optimization.md - Cost optimization guide Scripts

Infrastructure cost analyzer

python scripts/cost_analyzer.py --account production --period monthly

DR validation

python scripts/dr_test.py --region us-west-2 --type failover

Security audit

python scripts/security_audit.py --framework cis --output report.html

Resource inventory

python scripts/inventory.py --accounts all --format csv

返回排行榜