infra-engineer

安装量: 70
排名: #11021

安装

npx skills add https://github.com/samhvw8/dotfiles --skill infra-engineer

Infrastructure Engineering Skill

Comprehensive guide for modern infrastructure engineering covering DevOps practices, multi-cloud platforms (AWS, Azure, GCP, Cloudflare), FinOps cost optimization, and DevSecOps security practices.

When to Use This Skill

Use this skill when:

DevOps: Setting up CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins), implementing GitOps workflows (ArgoCD, Flux) AWS: Deploying to EC2, Lambda, ECS, EKS, managing S3, RDS, using CloudFormation/CDK Azure: Working with Azure VMs, App Service, AKS, Azure Functions, Storage Accounts GCP: Managing Compute Engine, GKE, Cloud Run, Cloud Storage, App Engine Cloudflare: Deploying Workers, R2 storage, D1 databases, Pages applications Kubernetes: Managing clusters, deployments, services, ingress, Helm charts, operators Docker: Containerizing applications, multi-stage builds, Docker Compose, registries FinOps: Analyzing cloud costs, optimizing spend, reserved instances, spot instances, rightsizing DevSecOps: Security scanning (SAST/DAST), vulnerability management, secrets management, compliance IaC: Terraform, CloudFormation, Pulumi, configuration management Monitoring: Setting up observability, logging, metrics, alerting, distributed tracing Platform Selection Guide When to Use AWS

Best For:

General-purpose cloud computing at scale Mature ecosystem with 200+ services Enterprise workloads with compliance requirements Hybrid cloud with AWS Outposts Extensive third-party integrations Advanced networking and security controls

Key Services:

EC2 (virtual machines, flexible compute) Lambda (serverless functions, event-driven) ECS/EKS (container orchestration) S3 (object storage, industry standard) RDS (managed relational databases) DynamoDB (NoSQL, global tables) CloudFormation/CDK (infrastructure as code) IAM (identity and access management) VPC (virtual private cloud networking)

Cost Profile: Pay-as-you-go, reserved instances (up to 72% discount), savings plans, spot instances (up to 90% discount)

When to Use Azure

Best For:

Microsoft-centric organizations (.NET, Active Directory) Hybrid cloud scenarios (Azure Arc, Stack) Enterprise agreements with Microsoft Windows Server and SQL Server workloads Integration with Microsoft 365 and Dynamics Strong compliance certifications (90+ certifications)

Key Services:

Virtual Machines (Windows/Linux compute) App Service (PaaS for web apps) AKS (managed Kubernetes) Azure Functions (serverless compute) Storage Accounts (Blob, File, Queue, Table) SQL Database (managed SQL Server) Active Directory (identity management) ARM Templates/Bicep (infrastructure as code)

Cost Profile: Pay-as-you-go, reserved instances, Azure Hybrid Benefit for Windows/SQL Server licenses

When to Use Cloudflare

Best For:

Edge-first applications with global distribution Ultra-low latency requirements (<50ms) Static sites with serverless functions Zero egress cost scenarios (R2 storage) WebSocket/real-time applications (Durable Objects) AI/ML at the edge (Workers AI)

Key Products:

Workers (serverless functions) R2 (object storage, S3-compatible) D1 (SQLite database with global replication) KV (key-value store) Pages (static hosting + functions) Durable Objects (stateful compute) Browser Rendering (headless browser automation)

Cost Profile: Pay-per-request, generous free tier, zero egress fees

When to Use Kubernetes

Best For:

Container orchestration at scale Microservices architectures with 10+ services Multi-cloud and hybrid deployments Self-healing and auto-scaling workloads Complex deployment strategies (blue/green, canary) Service mesh architectures (Istio, Linkerd) Stateful applications with operators

Key Features:

Declarative configuration (YAML manifests) Automated rollouts and rollbacks Service discovery and load balancing Self-healing (restarts failed containers) Horizontal pod autoscaling Secret and configuration management Storage orchestration Batch job execution

Managed Options: EKS (AWS), AKS (Azure), GKE (GCP), managed k8s providers

Cost Profile: Cluster management fees + node costs (optimize with spot instances, cluster autoscaling)

When to Use Docker

Best For:

Local development consistency Microservices architectures Multi-language stack applications Traditional VPS/VM deployments Foundation for Kubernetes workloads CI/CD build environments Database containerization (dev/test)

Key Capabilities:

Application isolation and portability Multi-stage builds for optimization Docker Compose for multi-container apps Volume management for data persistence Network configuration and service discovery Cross-platform compatibility (amd64, arm64) BuildKit for improved build performance

Cost Profile: Infrastructure cost only (compute + storage), no orchestration overhead

When to Use Google Cloud

Best For:

Enterprise-scale applications Data analytics and ML pipelines (BigQuery, Vertex AI) Hybrid/multi-cloud deployments Kubernetes at scale (GKE) Managed databases (Cloud SQL, Firestore, Spanner) Complex IAM and compliance requirements

Key Services:

Compute Engine (VMs) GKE (managed Kubernetes) Cloud Run (containerized serverless) App Engine (PaaS) Cloud Storage (object storage) Cloud SQL (managed databases)

Cost Profile: Varied pricing, sustained use discounts, committed use contracts

Quick Start AWS Lambda Function

Install AWS CLI

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip && sudo ./aws/install

Configure credentials

aws configure

Create Lambda function with SAM

sam init --runtime python3.11 sam build && sam deploy --guided

See: references/aws-lambda.md

AWS EKS Kubernetes Cluster

Install eksctl

brew install eksctl # or curl download

Create cluster

eksctl create cluster \ --name my-cluster \ --region us-west-2 \ --nodegroup-name standard-workers \ --node-type t3.medium \ --nodes 3 \ --nodes-min 1 \ --nodes-max 4

See: references/kubernetes-basics.md

Azure Deployment

Install Azure CLI

curl -L https://aka.ms/InstallAzureCli | bash

Login and create resources

az login az group create --name myResourceGroup --location eastus az webapp create --resource-group myResourceGroup \ --name myapp --runtime "NODE:18-lts"

See: references/azure-basics.md

Cloudflare Workers

Install Wrangler CLI

npm install -g wrangler

Create and deploy Worker

wrangler init my-worker cd my-worker wrangler deploy

See: references/cloudflare-workers-basics.md

Kubernetes Deployment

Create deployment

kubectl create deployment nginx --image=nginx:latest kubectl expose deployment nginx --port=80 --type=LoadBalancer

Apply from manifest

kubectl apply -f deployment.yaml

Check status

kubectl get pods,services,deployments

See: references/kubernetes-basics.md

Docker Container

Create Dockerfile

cat > Dockerfile <<EOF FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --production COPY . . EXPOSE 3000 CMD ["node", "server.js"] EOF

Build and run

docker build -t myapp . docker run -p 3000:3000 myapp

See: references/docker-basics.md

Reference Navigation AWS (Amazon Web Services) aws-overview.md - AWS fundamentals, account setup, IAM basics aws-ec2.md - EC2 instances, AMIs, security groups, auto-scaling aws-lambda.md - Serverless functions, SAM, event sources, layers aws-ecs-eks.md - Container orchestration, ECS vs EKS, Fargate aws-s3-rds.md - S3 storage, RDS databases, backup strategies aws-cloudformation.md - Infrastructure as code, CDK, best practices aws-networking.md - VPC, subnets, security groups, load balancers Azure (Microsoft Azure) azure-basics.md - Azure fundamentals, subscriptions, resource groups azure-compute.md - VMs, App Service, AKS, Azure Functions azure-storage.md - Storage Accounts, Blob, Files, managed disks Cloudflare Platform cloudflare-platform.md - Edge computing overview, key components cloudflare-workers-basics.md - Getting started, handler types, basic patterns cloudflare-workers-advanced.md - Advanced patterns, performance, optimization cloudflare-workers-apis.md - Runtime APIs, bindings, integrations cloudflare-r2-storage.md - R2 object storage, S3 compatibility, best practices cloudflare-d1-kv.md - D1 SQLite database, KV store, use cases browser-rendering.md - Puppeteer/Playwright automation on Cloudflare Kubernetes & Container Orchestration kubernetes-basics.md - Core concepts, pods, deployments, services kubernetes-advanced.md - StatefulSets, operators, custom resources kubernetes-networking.md - Ingress, service mesh, network policies helm-charts.md - Package management, charts, repositories Docker Containerization docker-basics.md - Core concepts, Dockerfile, images, containers docker-compose.md - Multi-container apps, networking, volumes docker-security.md - Image scanning, secrets, best practices Google Cloud Platform gcloud-platform.md - GCP overview, gcloud CLI, authentication gcloud-services.md - Compute Engine, GKE, Cloud Run, App Engine CI/CD & GitOps cicd-github-actions.md - GitHub Actions workflows, runners, secrets cicd-gitlab.md - GitLab CI/CD pipelines, artifacts, caching gitops-argocd.md - ArgoCD setup, app of apps pattern, sync policies gitops-flux.md - Flux controllers, GitOps toolkit, multi-tenancy FinOps (Cost Optimization) finops-basics.md - Cost optimization principles, FinOps lifecycle finops-aws.md - AWS cost optimization, RI, savings plans, spot finops-azure.md - Azure cost management, reservations, hybrid benefit finops-gcp.md - GCP cost optimization, committed use, sustained use finops-tools.md - Cost analysis tools, Kubecost, CloudHealth, Infracost DevSecOps (Security) devsecops-basics.md - Security best practices, shift-left security devsecops-scanning.md - SAST, DAST, SCA, container scanning secrets-management.md - Vault, AWS Secrets Manager, sealed secrets compliance.md - SOC2, HIPAA, PCI-DSS, audit logging Infrastructure as Code terraform-basics.md - Terraform fundamentals, providers, state terraform-advanced.md - Modules, workspaces, remote state cloudformation-basics.md - CloudFormation templates, stacks, change sets Utilities & Scripts scripts/cloudflare-deploy.py - Automate Cloudflare Worker deployments scripts/docker-optimize.py - Analyze and optimize Dockerfiles scripts/cost-analyzer.py - Cloud cost analysis and reporting scripts/security-scanner.py - Automated security scanning Common Workflows Multi-Cloud Architecture

Edge Layer: Cloudflare Workers (global routing, caching)

Compute Layer: AWS ECS/Lambda or Azure App Service (application logic)

Data Layer: AWS RDS or Azure SQL (persistent storage)

CDN/Storage: Cloudflare R2 or AWS S3 (static assets)

Benefits: - Best-of-breed services per layer - Geographic redundancy - Cost optimization across providers

AWS ECS Deployment with CI/CD

GitHub Actions workflow

name: Deploy to ECS on: push jobs: deploy: - Build Docker image - Push to ECR - Update ECS task definition - Deploy to ECS service - Wait for deployment stabilization

Kubernetes GitOps with ArgoCD

Git repository structure

/apps /production - deployment.yaml - service.yaml - ingress.yaml /staging - deployment.yaml

ArgoCD syncs cluster state from Git

Changes: Git commit → ArgoCD detects → Auto-sync to cluster

Multi-Stage Docker Build

Build stage

FROM node:20-alpine AS build WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build

Production stage

FROM node:20-alpine WORKDIR /app COPY --from=build /app/dist ./dist COPY --from=build /app/node_modules ./node_modules USER node CMD ["node", "dist/server.js"]

FinOps Cost Optimization Workflow

1. Discovery: Identify untagged resources

2. Analysis: Right-size instances (CPU/memory utilization)

3. Optimization:

- Convert to reserved instances (predictable workloads)

- Use spot instances (fault-tolerant workloads)

- Schedule start/stop (dev environments)

4. Monitoring: Set budget alerts, track savings

5. Governance: Enforce tagging policies

DevSecOps Security Pipeline

1. Code Commit

2. SAST Scan: SonarQube, Semgrep (static code analysis)

3. Dependency Check: Snyk, Trivy (vulnerability scanning)

4. Build: Docker image

5. Container Scan: Trivy, Grype (image vulnerabilities)

6. DAST Scan: OWASP ZAP (runtime security testing)

7. Deploy: Only if all scans pass

8. Runtime Protection: Falco, AWS GuardDuty

Terraform Infrastructure Deployment

1. Write: Define infrastructure in .tf files

2. Init: terraform init (download providers)

3. Plan: terraform plan (preview changes)

4. Apply: terraform apply (create/update resources)

5. State: Store state in S3 with DynamoDB locking

6. Modules: Reuse common patterns across environments

Best Practices DevOps CI/CD: Automate testing and deployment, use feature flags for progressive rollouts GitOps: Declarative infrastructure, Git as single source of truth, automated sync Monitoring: Implement observability (logs, metrics, traces), set up alerting Incident Management: Runbooks, postmortems, blameless culture Automation: Infrastructure as code, configuration management, self-service platforms Security (DevSecOps) Shift Left: Security scanning early in pipeline (SAST, dependency checks) Secrets Management: Use Vault, AWS Secrets Manager, or sealed secrets (never in code/Git) Container Security: Run as non-root, minimal base images, regular scanning Network Security: Zero-trust architecture, service mesh, network policies Access Control: Least privilege IAM, MFA, temporary credentials Compliance: Audit logging, encryption at rest/transit, regular security reviews Runtime Protection: Security monitoring, intrusion detection, automated response Cost Optimization (FinOps) Tagging: Enforce resource tagging for cost allocation and tracking Rightsizing: Analyze utilization, downsize over-provisioned resources Reserved Capacity: Purchase RI/savings plans for predictable workloads (up to 72% discount) Spot/Preemptible: Use for fault-tolerant workloads (up to 90% discount) Scheduling: Auto-stop dev/test environments during off-hours Storage Optimization: Lifecycle policies, archive to cheaper tiers, delete orphaned resources Monitoring: Budget alerts, cost anomaly detection, chargeback/showback Governance: Approval workflows for expensive resources, quota management Kubernetes Resource Management: Set requests/limits, use horizontal pod autoscaling High Availability: Multi-zone clusters, pod disruption budgets, anti-affinity rules Security: RBAC, pod security policies, network policies, admission controllers Observability: Prometheus metrics, distributed tracing, centralized logging GitOps: ArgoCD/Flux for declarative deployments, automatic drift correction Performance Compute: Auto-scaling, load balancing, multi-region for low latency Caching: CDN, in-memory caching (Redis/Memcached), edge computing Storage: Choose appropriate tier (SSD vs HDD), enable caching, CDN for static assets Containers: Multi-stage builds, minimal images, layer caching Databases: Connection pooling, read replicas, query optimization, indexing Development Local Development: Docker Compose for consistent environments, dev containers Testing: Unit, integration, end-to-end tests in CI/CD pipeline Infrastructure as Code: Terraform/CloudFormation for repeatability Documentation: Architecture diagrams, runbooks, API documentation Version Control: Git for code and infrastructure, semantic versioning Decision Matrix Need Choose Compute Sub-50ms latency globally Cloudflare Workers Serverless functions (AWS ecosystem) AWS Lambda Serverless functions (Azure ecosystem) Azure Functions Containerized workloads (managed) AWS ECS/Fargate, Azure AKS, GCP Cloud Run Kubernetes at scale AWS EKS, Azure AKS, GCP GKE VMs with full control AWS EC2, Azure VMs, GCP Compute Engine Storage Object storage (S3-compatible) AWS S3, Cloudflare R2 (zero egress), Azure Blob Block storage for VMs AWS EBS, Azure Managed Disks, GCP Persistent Disk File storage (NFS/SMB) AWS EFS, Azure Files, GCP Filestore Database
Managed SQL (AWS) AWS RDS (PostgreSQL, MySQL, SQL Server) Managed SQL (Azure) Azure SQL Database Managed SQL (GCP) Cloud SQL NoSQL key-value AWS DynamoDB, Azure Cosmos DB, Cloudflare KV Global SQL (edge reads) Cloudflare D1, AWS Aurora Global CI/CD & GitOps
GitHub-integrated CI/CD GitHub Actions Self-hosted CI/CD GitLab CI/CD, Jenkins Kubernetes GitOps ArgoCD, Flux Cost Optimization
Predictable workloads Reserved Instances, Savings Plans Fault-tolerant workloads Spot Instances (AWS), Preemptible VMs (GCP) Dev/test environments Auto-scheduling, budget alerts Security
Secrets management HashiCorp Vault, AWS Secrets Manager, Azure Key Vault Container scanning Trivy, Snyk, AWS ECR scanning SAST/DAST SonarQube, Semgrep, OWASP ZAP Special Use Cases
Static site + edge functions Cloudflare Pages, AWS Amplify WebSocket/real-time Cloudflare Durable Objects, AWS API Gateway WebSocket ML/AI pipelines AWS SageMaker, GCP Vertex AI, Azure ML Browser automation Cloudflare Browser Rendering, AWS Lambda + Puppeteer Resources Cloud Providers AWS Docs: https://docs.aws.amazon.com Azure Docs: https://docs.microsoft.com/azure GCP Docs: https://cloud.google.com/docs Cloudflare Docs: https://developers.cloudflare.com Container & Orchestration Docker Docs: https://docs.docker.com Kubernetes Docs: https://kubernetes.io/docs Helm: https://helm.sh/docs CI/CD & GitOps GitHub Actions: https://docs.github.com/actions GitLab CI: https://docs.gitlab.com/ee/ci/ ArgoCD: https://argo-cd.readthedocs.io Flux: https://fluxcd.io/docs Infrastructure as Code Terraform: https://developer.hashicorp.com/terraform AWS CDK: https://docs.aws.amazon.com/cdk Pulumi: https://www.pulumi.com/docs Security & Compliance OWASP: https://owasp.org CIS Benchmarks: https://www.cisecurity.org/cis-benchmarks HashiCorp Vault: https://developer.hashicorp.com/vault FinOps & Cost Optimization FinOps Foundation: https://www.finops.org AWS Cost Optimization: https://aws.amazon.com/pricing/cost-optimization Kubecost: https://www.kubecost.com Implementation Checklist AWS Lambda Deployment Install AWS CLI and SAM CLI Configure AWS credentials (access key, secret key) Create Lambda function with SAM template Configure IAM role and policies Test locally with sam local invoke Deploy with sam deploy Set up CloudWatch monitoring and alarms AWS EKS Kubernetes Cluster Install kubectl, eksctl, aws-cli Configure AWS credentials Create EKS cluster with eksctl Configure kubectl context Install cluster autoscaler Set up Helm for package management Deploy applications with kubectl/Helm Configure ingress controller (ALB/NGINX) Azure Deployment Install Azure CLI Login with az login Create resource group Deploy App Service or AKS Configure continuous deployment Set up monitoring with Application Insights Kubernetes on Any Cloud Install kubectl and helm Connect to cluster (update kubeconfig) Create namespaces for environments Apply RBAC policies Deploy applications (deployments, services) Configure ingress for external access Set up monitoring (Prometheus, Grafana) Implement GitOps with ArgoCD/Flux CI/CD Pipeline (GitHub Actions) Create .github/workflows/deploy.yml Configure secrets (cloud credentials, API keys) Add build and test jobs Add container build and push to registry Add deployment job to cloud platform Set up branch protection rules Enable status checks and notifications FinOps Cost Optimization Implement resource tagging strategy Enable cost allocation tags Set up budget alerts Analyze resource utilization (CloudWatch, Azure Monitor) Identify rightsizing opportunities Purchase reserved instances for predictable workloads Configure auto-scaling and scheduling Regular cost reviews and optimization DevSecOps Security Add SAST scanning to CI/CD (SonarQube, Semgrep) Add dependency scanning (Snyk, Trivy) Implement container image scanning Set up secrets management (Vault, cloud provider) Configure security groups and network policies Enable audit logging Implement security monitoring and alerting Regular vulnerability assessments Cloudflare Workers Install Wrangler CLI Create Worker project Configure wrangler.toml (bindings, routes) Test locally with wrangler dev Deploy with wrangler deploy Docker Write Dockerfile with multi-stage builds Create .dockerignore file Test build locally Push to registry (ECR, ACR, GCR, Docker Hub) Deploy to target platform

返回排行榜