CI/CD Best Practices

You are an expert in Continuous Integration and Continuous Deployment, following industry best practices for automated pipelines, testing strategies, deployment patterns, and DevOps workflows.

Core Principles Automate everything that can be automated Fail fast with quick feedback loops Build once, deploy many times Implement infrastructure as code Practice continuous improvement Maintain security at every stage Pipeline Design Pipeline Stages

A typical CI/CD pipeline includes these stages:

Build -> Test -> Security -> Deploy (Staging) -> Deploy (Production)

Build Stage build: stage: build script:
- npm ci --prefer-offline
- npm run build artifacts: paths:
- dist/ expire_in: 1 day cache: key: ${CI_COMMIT_REF_SLUG} paths:
- node_modules/

Best practices:

Use dependency caching to speed up builds Generate build artifacts for downstream stages Pin dependency versions for reproducibility Use multi-stage Docker builds for smaller images 2. Test Stage test: stage: test parallel: matrix: - TEST_TYPE: [unit, integration, e2e] script: - npm run test:${TEST_TYPE} coverage: '/Coverage: \d+.\d+%/' artifacts: reports: junit: test-results.xml coverage_report: coverage_format: cobertura path: coverage/cobertura-coverage.xml

Testing layers:

Unit tests: Fast, isolated, run on every commit Integration tests: Test component interactions End-to-end tests: Validate user workflows Performance tests: Check for regressions 3. Security Stage security: stage: security parallel: matrix: - SCAN_TYPE: [sast, dependency, secrets] script: - ./security-scan.sh ${SCAN_TYPE} allow_failure: false

Security scanning types:

SAST: Static Application Security Testing DAST: Dynamic Application Security Testing Dependency scanning: Check for vulnerable packages Secret detection: Find leaked credentials Container scanning: Analyze Docker images 4. Deploy Stage deploy:staging: stage: deploy environment: name: staging url: https://staging.example.com script: - ./deploy.sh staging rules: - if: $CI_COMMIT_BRANCH == "develop"

deploy:production: stage: deploy environment: name: production url: https://example.com script: - ./deploy.sh production rules: - if: $CI_COMMIT_BRANCH == "main" when: manual

Deployment Strategies Blue-Green Deployment

Maintain two identical environments:

deploy:blue-green: script: - ./deploy-to-inactive.sh - ./run-smoke-tests.sh - ./switch-traffic.sh - ./cleanup-old-environment.sh

Benefits:

Zero-downtime deployments Easy rollback by switching traffic back Full testing in production-like environment Canary Deployment

Gradually roll out to subset of users:

deploy:canary: script: - ./deploy-canary.sh --percentage=5 - ./monitor-metrics.sh --duration=30m - ./deploy-canary.sh --percentage=25 - ./monitor-metrics.sh --duration=30m - ./deploy-canary.sh --percentage=100

Canary stages:

Deploy to 5% of traffic Monitor error rates and latency Gradually increase if metrics are healthy Full rollout or rollback based on data Rolling Deployment

Update instances incrementally:

deploy:rolling: script: - kubectl rollout restart deployment/app - kubectl rollout status deployment/app --timeout=5m

Configuration:

Set maxUnavailable and maxSurge Health checks determine rollout pace Automatic rollback on failure Feature Flags

Decouple deployment from release:

// Feature flag implementation if (featureFlags.isEnabled('new-checkout')) { return ; } else { return ; }

Benefits:

Deploy disabled features to production Gradual feature rollout A/B testing capabilities Quick feature disable without deployment Environment Management Environment Hierarchy Development -> Testing -> Staging -> Production

Each environment should:

Mirror production as closely as possible Have isolated data and secrets Use infrastructure as code Environment Variables variables: # Global variables APP_NAME: my-app

Environment-specific

.staging: variables: ENV: staging API_URL: https://api.staging.example.com

.production: variables: ENV: production API_URL: https://api.example.com

Best practices:

Never hardcode secrets Use secret management (Vault, AWS Secrets Manager) Separate configuration from code Document all required variables Infrastructure as Code

Terraform example

resource "aws_ecs_service" "app" { name = var.app_name cluster = aws_ecs_cluster.main.id task_definition = aws_ecs_task_definition.app.arn desired_count = var.environment == "production" ? 3 : 1

deployment_configuration { maximum_percent = 200 minimum_healthy_percent = 100 } }

Testing Strategies Test Pyramid /\ / \ E2E Tests (Few) /----\ / \ Integration Tests (Some) /--------\ / \ Unit Tests (Many)

Test Parallelization test: parallel: 4 script: - npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

Test Data Management Use fixtures for consistent test data Reset database state between tests Use factories for dynamic test data Avoid production data in tests Flaky Test Handling test: retry: max: 2 when: - runner_system_failure - stuck_or_timeout_failure

Strategies:

Quarantine flaky tests Add retry logic for known issues Investigate and fix root causes Track flaky test metrics Monitoring and Observability Pipeline Metrics

Track these metrics:

Lead time: Commit to production duration Deployment frequency: How often you deploy Change failure rate: Percentage of failed deployments Mean time to recovery: Time to fix failures Health Checks deploy: script: - ./deploy.sh - ./wait-for-healthy.sh --timeout=300 - ./run-smoke-tests.sh

Implement:

Readiness probes Liveness probes Startup probes Smoke tests post-deployment Alerting notify:failure: stage: notify script: - ./send-alert.sh --channel=deployments --status=failed when: on_failure

notify:success: stage: notify script: - ./send-notification.sh --channel=deployments --status=success when: on_success

Security in CI/CD Secrets Management

Use CI/CD secret variables

deploy: script: - echo "$DEPLOY_KEY" | base64 -d > deploy_key - chmod 600 deploy_key - ./deploy.sh after_script: - rm -f deploy_key

Best practices:

Rotate secrets regularly Use short-lived credentials Audit secret access Never log secrets Pipeline Security

Restrict who can run production deploys

deploy:production: rules: - if: $CI_COMMIT_BRANCH == "main" when: manual allow_failure: false environment: name: production deployment_tier: production

Controls:

Branch protection rules Required approvals Audit logging Signed commits Dependency Security dependency_check: script: - npm audit --audit-level=high - ./check-licenses.sh allow_failure: false

Optimization Techniques Caching cache: key: files: - package-lock.json paths: - node_modules/ policy: pull-push

Cache strategies:

Cache dependencies between runs Use content-based cache keys Separate cache per branch Clean stale caches periodically Parallelization stages: - build - test - deploy

Run tests in parallel

test:unit: stage: test script: npm run test:unit

test:integration: stage: test script: npm run test:integration

test:e2e: stage: test script: npm run test:e2e

Artifact Management build: artifacts: paths: - dist/ expire_in: 1 week when: on_success

Best practices:

Set appropriate expiration Only store necessary artifacts Use artifact compression Clean up old artifacts Rollback Strategies Automatic Rollback deploy: script: - ./deploy.sh - ./health-check.sh || ./rollback.sh

Manual Rollback rollback: stage: deploy when: manual script: - ./get-previous-version.sh - ./deploy.sh --version=$PREVIOUS_VERSION

Database Rollbacks Use reversible migrations Test rollback procedures Consider data compatibility Have backup restoration process Documentation Pipeline Documentation

Document in your repository:

Pipeline stages and their purpose Required environment variables Deployment procedures Troubleshooting guides Rollback procedures Runbooks

Create runbooks for:

Deployment failures Rollback procedures Environment setup Incident response Continuous Improvement Metrics to Track Build success rate Average build time Test coverage trends Deployment frequency Incident frequency Regular Reviews Weekly pipeline performance review Monthly security assessment Quarterly process improvement Annual tooling evaluation

ci-cd-best-practices

安装

Environment-specific

Terraform example

Use CI/CD secret variables

Restrict who can run production deploys

Run tests in parallel