Infrastructure Engineering Skill
Comprehensive guide for modern infrastructure engineering covering DevOps practices, multi-cloud platforms (AWS, Azure, GCP, Cloudflare), FinOps cost optimization, and DevSecOps security practices.
When to Use This Skill
Use this skill when:
DevOps: Setting up CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins), implementing GitOps workflows (ArgoCD, Flux) AWS: Deploying to EC2, Lambda, ECS, EKS, managing S3, RDS, using CloudFormation/CDK Azure: Working with Azure VMs, App Service, AKS, Azure Functions, Storage Accounts GCP: Managing Compute Engine, GKE, Cloud Run, Cloud Storage, App Engine Cloudflare: Deploying Workers, R2 storage, D1 databases, Pages applications Kubernetes: Managing clusters, deployments, services, ingress, Helm charts, operators Docker: Containerizing applications, multi-stage builds, Docker Compose, registries FinOps: Analyzing cloud costs, optimizing spend, reserved instances, spot instances, rightsizing DevSecOps: Security scanning (SAST/DAST), vulnerability management, secrets management, compliance IaC: Terraform, CloudFormation, Pulumi, configuration management Monitoring: Setting up observability, logging, metrics, alerting, distributed tracing Platform Selection Guide When to Use AWS
Best For:
General-purpose cloud computing at scale Mature ecosystem with 200+ services Enterprise workloads with compliance requirements Hybrid cloud with AWS Outposts Extensive third-party integrations Advanced networking and security controls
Key Services:
EC2 (virtual machines, flexible compute) Lambda (serverless functions, event-driven) ECS/EKS (container orchestration) S3 (object storage, industry standard) RDS (managed relational databases) DynamoDB (NoSQL, global tables) CloudFormation/CDK (infrastructure as code) IAM (identity and access management) VPC (virtual private cloud networking)
Cost Profile: Pay-as-you-go, reserved instances (up to 72% discount), savings plans, spot instances (up to 90% discount)
When to Use Azure
Best For:
Microsoft-centric organizations (.NET, Active Directory) Hybrid cloud scenarios (Azure Arc, Stack) Enterprise agreements with Microsoft Windows Server and SQL Server workloads Integration with Microsoft 365 and Dynamics Strong compliance certifications (90+ certifications)
Key Services:
Virtual Machines (Windows/Linux compute) App Service (PaaS for web apps) AKS (managed Kubernetes) Azure Functions (serverless compute) Storage Accounts (Blob, File, Queue, Table) SQL Database (managed SQL Server) Active Directory (identity management) ARM Templates/Bicep (infrastructure as code)
Cost Profile: Pay-as-you-go, reserved instances, Azure Hybrid Benefit for Windows/SQL Server licenses
When to Use Cloudflare
Best For:
Edge-first applications with global distribution Ultra-low latency requirements (<50ms) Static sites with serverless functions Zero egress cost scenarios (R2 storage) WebSocket/real-time applications (Durable Objects) AI/ML at the edge (Workers AI)
Key Products:
Workers (serverless functions) R2 (object storage, S3-compatible) D1 (SQLite database with global replication) KV (key-value store) Pages (static hosting + functions) Durable Objects (stateful compute) Browser Rendering (headless browser automation)
Cost Profile: Pay-per-request, generous free tier, zero egress fees
When to Use Kubernetes
Best For:
Container orchestration at scale Microservices architectures with 10+ services Multi-cloud and hybrid deployments Self-healing and auto-scaling workloads Complex deployment strategies (blue/green, canary) Service mesh architectures (Istio, Linkerd) Stateful applications with operators
Key Features:
Declarative configuration (YAML manifests) Automated rollouts and rollbacks Service discovery and load balancing Self-healing (restarts failed containers) Horizontal pod autoscaling Secret and configuration management Storage orchestration Batch job execution
Managed Options: EKS (AWS), AKS (Azure), GKE (GCP), managed k8s providers
Cost Profile: Cluster management fees + node costs (optimize with spot instances, cluster autoscaling)
When to Use Docker
Best For:
Local development consistency Microservices architectures Multi-language stack applications Traditional VPS/VM deployments Foundation for Kubernetes workloads CI/CD build environments Database containerization (dev/test)
Key Capabilities:
Application isolation and portability Multi-stage builds for optimization Docker Compose for multi-container apps Volume management for data persistence Network configuration and service discovery Cross-platform compatibility (amd64, arm64) BuildKit for improved build performance
Cost Profile: Infrastructure cost only (compute + storage), no orchestration overhead
When to Use Google Cloud
Best For:
Enterprise-scale applications Data analytics and ML pipelines (BigQuery, Vertex AI) Hybrid/multi-cloud deployments Kubernetes at scale (GKE) Managed databases (Cloud SQL, Firestore, Spanner) Complex IAM and compliance requirements
Key Services:
Compute Engine (VMs) GKE (managed Kubernetes) Cloud Run (containerized serverless) App Engine (PaaS) Cloud Storage (object storage) Cloud SQL (managed databases)
Cost Profile: Varied pricing, sustained use discounts, committed use contracts
Quick Start AWS Lambda Function
Install AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip && sudo ./aws/install
Configure credentials
aws configure
Create Lambda function with SAM
sam init --runtime python3.11 sam build && sam deploy --guided
See: references/aws-lambda.md
AWS EKS Kubernetes Cluster
Install eksctl
brew install eksctl # or curl download
Create cluster
eksctl create cluster \ --name my-cluster \ --region us-west-2 \ --nodegroup-name standard-workers \ --node-type t3.medium \ --nodes 3 \ --nodes-min 1 \ --nodes-max 4
See: references/kubernetes-basics.md
Azure Deployment
Install Azure CLI
curl -L https://aka.ms/InstallAzureCli | bash
Login and create resources
az login az group create --name myResourceGroup --location eastus az webapp create --resource-group myResourceGroup \ --name myapp --runtime "NODE:18-lts"
See: references/azure-basics.md
Cloudflare Workers
Install Wrangler CLI
npm install -g wrangler
Create and deploy Worker
wrangler init my-worker cd my-worker wrangler deploy
See: references/cloudflare-workers-basics.md
Kubernetes Deployment
Create deployment
kubectl create deployment nginx --image=nginx:latest kubectl expose deployment nginx --port=80 --type=LoadBalancer
Apply from manifest
kubectl apply -f deployment.yaml
Check status
kubectl get pods,services,deployments
See: references/kubernetes-basics.md
Docker Container
Create Dockerfile
cat > Dockerfile <<EOF FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --production COPY . . EXPOSE 3000 CMD ["node", "server.js"] EOF
Build and run
docker build -t myapp . docker run -p 3000:3000 myapp
See: references/docker-basics.md
Reference Navigation AWS (Amazon Web Services) aws-overview.md - AWS fundamentals, account setup, IAM basics aws-ec2.md - EC2 instances, AMIs, security groups, auto-scaling aws-lambda.md - Serverless functions, SAM, event sources, layers aws-ecs-eks.md - Container orchestration, ECS vs EKS, Fargate aws-s3-rds.md - S3 storage, RDS databases, backup strategies aws-cloudformation.md - Infrastructure as code, CDK, best practices aws-networking.md - VPC, subnets, security groups, load balancers Azure (Microsoft Azure) azure-basics.md - Azure fundamentals, subscriptions, resource groups azure-compute.md - VMs, App Service, AKS, Azure Functions azure-storage.md - Storage Accounts, Blob, Files, managed disks Cloudflare Platform cloudflare-platform.md - Edge computing overview, key components cloudflare-workers-basics.md - Getting started, handler types, basic patterns cloudflare-workers-advanced.md - Advanced patterns, performance, optimization cloudflare-workers-apis.md - Runtime APIs, bindings, integrations cloudflare-r2-storage.md - R2 object storage, S3 compatibility, best practices cloudflare-d1-kv.md - D1 SQLite database, KV store, use cases browser-rendering.md - Puppeteer/Playwright automation on Cloudflare Kubernetes & Container Orchestration kubernetes-basics.md - Core concepts, pods, deployments, services kubernetes-advanced.md - StatefulSets, operators, custom resources kubernetes-networking.md - Ingress, service mesh, network policies helm-charts.md - Package management, charts, repositories Docker Containerization docker-basics.md - Core concepts, Dockerfile, images, containers docker-compose.md - Multi-container apps, networking, volumes docker-security.md - Image scanning, secrets, best practices Google Cloud Platform gcloud-platform.md - GCP overview, gcloud CLI, authentication gcloud-services.md - Compute Engine, GKE, Cloud Run, App Engine CI/CD & GitOps cicd-github-actions.md - GitHub Actions workflows, runners, secrets cicd-gitlab.md - GitLab CI/CD pipelines, artifacts, caching gitops-argocd.md - ArgoCD setup, app of apps pattern, sync policies gitops-flux.md - Flux controllers, GitOps toolkit, multi-tenancy FinOps (Cost Optimization) finops-basics.md - Cost optimization principles, FinOps lifecycle finops-aws.md - AWS cost optimization, RI, savings plans, spot finops-azure.md - Azure cost management, reservations, hybrid benefit finops-gcp.md - GCP cost optimization, committed use, sustained use finops-tools.md - Cost analysis tools, Kubecost, CloudHealth, Infracost DevSecOps (Security) devsecops-basics.md - Security best practices, shift-left security devsecops-scanning.md - SAST, DAST, SCA, container scanning secrets-management.md - Vault, AWS Secrets Manager, sealed secrets compliance.md - SOC2, HIPAA, PCI-DSS, audit logging Infrastructure as Code terraform-basics.md - Terraform fundamentals, providers, state terraform-advanced.md - Modules, workspaces, remote state cloudformation-basics.md - CloudFormation templates, stacks, change sets Utilities & Scripts scripts/cloudflare-deploy.py - Automate Cloudflare Worker deployments scripts/docker-optimize.py - Analyze and optimize Dockerfiles scripts/cost-analyzer.py - Cloud cost analysis and reporting scripts/security-scanner.py - Automated security scanning Common Workflows Multi-Cloud Architecture
Edge Layer: Cloudflare Workers (global routing, caching)
Compute Layer: AWS ECS/Lambda or Azure App Service (application logic)
Data Layer: AWS RDS or Azure SQL (persistent storage)
CDN/Storage: Cloudflare R2 or AWS S3 (static assets)
Benefits: - Best-of-breed services per layer - Geographic redundancy - Cost optimization across providers
AWS ECS Deployment with CI/CD
GitHub Actions workflow
name: Deploy to ECS on: push jobs: deploy: - Build Docker image - Push to ECR - Update ECS task definition - Deploy to ECS service - Wait for deployment stabilization
Kubernetes GitOps with ArgoCD
Git repository structure
/apps /production - deployment.yaml - service.yaml - ingress.yaml /staging - deployment.yaml
ArgoCD syncs cluster state from Git
Changes: Git commit → ArgoCD detects → Auto-sync to cluster
Multi-Stage Docker Build
Build stage
FROM node:20-alpine AS build WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build
Production stage
FROM node:20-alpine WORKDIR /app COPY --from=build /app/dist ./dist COPY --from=build /app/node_modules ./node_modules USER node CMD ["node", "dist/server.js"]
FinOps Cost Optimization Workflow
1. Discovery: Identify untagged resources
2. Analysis: Right-size instances (CPU/memory utilization)
3. Optimization:
- Convert to reserved instances (predictable workloads)
- Use spot instances (fault-tolerant workloads)
- Schedule start/stop (dev environments)
4. Monitoring: Set budget alerts, track savings
5. Governance: Enforce tagging policies
DevSecOps Security Pipeline
1. Code Commit
2. SAST Scan: SonarQube, Semgrep (static code analysis)
3. Dependency Check: Snyk, Trivy (vulnerability scanning)
4. Build: Docker image
5. Container Scan: Trivy, Grype (image vulnerabilities)
6. DAST Scan: OWASP ZAP (runtime security testing)
7. Deploy: Only if all scans pass
8. Runtime Protection: Falco, AWS GuardDuty
Terraform Infrastructure Deployment
1. Write: Define infrastructure in .tf files
2. Init: terraform init (download providers)
3. Plan: terraform plan (preview changes)
4. Apply: terraform apply (create/update resources)
5. State: Store state in S3 with DynamoDB locking
6. Modules: Reuse common patterns across environments
Best Practices
DevOps
CI/CD: Automate testing and deployment, use feature flags for progressive rollouts
GitOps: Declarative infrastructure, Git as single source of truth, automated sync
Monitoring: Implement observability (logs, metrics, traces), set up alerting
Incident Management: Runbooks, postmortems, blameless culture
Automation: Infrastructure as code, configuration management, self-service platforms
Security (DevSecOps)
Shift Left: Security scanning early in pipeline (SAST, dependency checks)
Secrets Management: Use Vault, AWS Secrets Manager, or sealed secrets (never in code/Git)
Container Security: Run as non-root, minimal base images, regular scanning
Network Security: Zero-trust architecture, service mesh, network policies
Access Control: Least privilege IAM, MFA, temporary credentials
Compliance: Audit logging, encryption at rest/transit, regular security reviews
Runtime Protection: Security monitoring, intrusion detection, automated response
Cost Optimization (FinOps)
Tagging: Enforce resource tagging for cost allocation and tracking
Rightsizing: Analyze utilization, downsize over-provisioned resources
Reserved Capacity: Purchase RI/savings plans for predictable workloads (up to 72% discount)
Spot/Preemptible: Use for fault-tolerant workloads (up to 90% discount)
Scheduling: Auto-stop dev/test environments during off-hours
Storage Optimization: Lifecycle policies, archive to cheaper tiers, delete orphaned resources
Monitoring: Budget alerts, cost anomaly detection, chargeback/showback
Governance: Approval workflows for expensive resources, quota management
Kubernetes
Resource Management: Set requests/limits, use horizontal pod autoscaling
High Availability: Multi-zone clusters, pod disruption budgets, anti-affinity rules
Security: RBAC, pod security policies, network policies, admission controllers
Observability: Prometheus metrics, distributed tracing, centralized logging
GitOps: ArgoCD/Flux for declarative deployments, automatic drift correction
Performance
Compute: Auto-scaling, load balancing, multi-region for low latency
Caching: CDN, in-memory caching (Redis/Memcached), edge computing
Storage: Choose appropriate tier (SSD vs HDD), enable caching, CDN for static assets
Containers: Multi-stage builds, minimal images, layer caching
Databases: Connection pooling, read replicas, query optimization, indexing
Development
Local Development: Docker Compose for consistent environments, dev containers
Testing: Unit, integration, end-to-end tests in CI/CD pipeline
Infrastructure as Code: Terraform/CloudFormation for repeatability
Documentation: Architecture diagrams, runbooks, API documentation
Version Control: Git for code and infrastructure, semantic versioning
Decision Matrix
Need Choose
Compute
Sub-50ms latency globally Cloudflare Workers
Serverless functions (AWS ecosystem) AWS Lambda
Serverless functions (Azure ecosystem) Azure Functions
Containerized workloads (managed) AWS ECS/Fargate, Azure AKS, GCP Cloud Run
Kubernetes at scale AWS EKS, Azure AKS, GCP GKE
VMs with full control AWS EC2, Azure VMs, GCP Compute Engine
Storage
Object storage (S3-compatible) AWS S3, Cloudflare R2 (zero egress), Azure Blob
Block storage for VMs AWS EBS, Azure Managed Disks, GCP Persistent Disk
File storage (NFS/SMB) AWS EFS, Azure Files, GCP Filestore
Database
Managed SQL (AWS) AWS RDS (PostgreSQL, MySQL, SQL Server)
Managed SQL (Azure) Azure SQL Database
Managed SQL (GCP) Cloud SQL
NoSQL key-value AWS DynamoDB, Azure Cosmos DB, Cloudflare KV
Global SQL (edge reads) Cloudflare D1, AWS Aurora Global
CI/CD & GitOps
GitHub-integrated CI/CD GitHub Actions
Self-hosted CI/CD GitLab CI/CD, Jenkins
Kubernetes GitOps ArgoCD, Flux
Cost Optimization
Predictable workloads Reserved Instances, Savings Plans
Fault-tolerant workloads Spot Instances (AWS), Preemptible VMs (GCP)
Dev/test environments Auto-scheduling, budget alerts
Security
Secrets management HashiCorp Vault, AWS Secrets Manager, Azure Key Vault
Container scanning Trivy, Snyk, AWS ECR scanning
SAST/DAST SonarQube, Semgrep, OWASP ZAP
Special Use Cases
Static site + edge functions Cloudflare Pages, AWS Amplify
WebSocket/real-time Cloudflare Durable Objects, AWS API Gateway WebSocket
ML/AI pipelines AWS SageMaker, GCP Vertex AI, Azure ML
Browser automation Cloudflare Browser Rendering, AWS Lambda + Puppeteer
Resources
Cloud Providers
AWS Docs: https://docs.aws.amazon.com
Azure Docs: https://docs.microsoft.com/azure
GCP Docs: https://cloud.google.com/docs
Cloudflare Docs: https://developers.cloudflare.com
Container & Orchestration
Docker Docs: https://docs.docker.com
Kubernetes Docs: https://kubernetes.io/docs
Helm: https://helm.sh/docs
CI/CD & GitOps
GitHub Actions: https://docs.github.com/actions
GitLab CI: https://docs.gitlab.com/ee/ci/
ArgoCD: https://argo-cd.readthedocs.io
Flux: https://fluxcd.io/docs
Infrastructure as Code
Terraform: https://developer.hashicorp.com/terraform
AWS CDK: https://docs.aws.amazon.com/cdk
Pulumi: https://www.pulumi.com/docs
Security & Compliance
OWASP: https://owasp.org
CIS Benchmarks: https://www.cisecurity.org/cis-benchmarks
HashiCorp Vault: https://developer.hashicorp.com/vault
FinOps & Cost Optimization
FinOps Foundation: https://www.finops.org
AWS Cost Optimization: https://aws.amazon.com/pricing/cost-optimization
Kubecost: https://www.kubecost.com
Implementation Checklist
AWS Lambda Deployment
Install AWS CLI and SAM CLI
Configure AWS credentials (access key, secret key)
Create Lambda function with SAM template
Configure IAM role and policies
Test locally with sam local invoke
Deploy with sam deploy
Set up CloudWatch monitoring and alarms
AWS EKS Kubernetes Cluster
Install kubectl, eksctl, aws-cli
Configure AWS credentials
Create EKS cluster with eksctl
Configure kubectl context
Install cluster autoscaler
Set up Helm for package management
Deploy applications with kubectl/Helm
Configure ingress controller (ALB/NGINX)
Azure Deployment
Install Azure CLI
Login with az login
Create resource group
Deploy App Service or AKS
Configure continuous deployment
Set up monitoring with Application Insights
Kubernetes on Any Cloud
Install kubectl and helm
Connect to cluster (update kubeconfig)
Create namespaces for environments
Apply RBAC policies
Deploy applications (deployments, services)
Configure ingress for external access
Set up monitoring (Prometheus, Grafana)
Implement GitOps with ArgoCD/Flux
CI/CD Pipeline (GitHub Actions)
Create .github/workflows/deploy.yml
Configure secrets (cloud credentials, API keys)
Add build and test jobs
Add container build and push to registry
Add deployment job to cloud platform
Set up branch protection rules
Enable status checks and notifications
FinOps Cost Optimization
Implement resource tagging strategy
Enable cost allocation tags
Set up budget alerts
Analyze resource utilization (CloudWatch, Azure Monitor)
Identify rightsizing opportunities
Purchase reserved instances for predictable workloads
Configure auto-scaling and scheduling
Regular cost reviews and optimization
DevSecOps Security
Add SAST scanning to CI/CD (SonarQube, Semgrep)
Add dependency scanning (Snyk, Trivy)
Implement container image scanning
Set up secrets management (Vault, cloud provider)
Configure security groups and network policies
Enable audit logging
Implement security monitoring and alerting
Regular vulnerability assessments
Cloudflare Workers
Install Wrangler CLI
Create Worker project
Configure wrangler.toml (bindings, routes)
Test locally with wrangler dev
Deploy with wrangler deploy
Docker
Write Dockerfile with multi-stage builds
Create .dockerignore file
Test build locally
Push to registry (ECR, ACR, GCR, Docker Hub)
Deploy to target platform