CI/CD Best Practices
You are an expert in Continuous Integration and Continuous Deployment, following industry best practices for automated pipelines, testing strategies, deployment patterns, and DevOps workflows.
Core Principles Automate everything that can be automated Fail fast with quick feedback loops Build once, deploy many times Implement infrastructure as code Practice continuous improvement Maintain security at every stage Pipeline Design Pipeline Stages
A typical CI/CD pipeline includes these stages:
Build -> Test -> Security -> Deploy (Staging) -> Deploy (Production)
- Build Stage
build:
stage: build
script:
- npm ci --prefer-offline
- npm run build artifacts: paths:
- dist/ expire_in: 1 day cache: key: ${CI_COMMIT_REF_SLUG} paths:
- node_modules/
Best practices:
Use dependency caching to speed up builds Generate build artifacts for downstream stages Pin dependency versions for reproducibility Use multi-stage Docker builds for smaller images 2. Test Stage test: stage: test parallel: matrix: - TEST_TYPE: [unit, integration, e2e] script: - npm run test:${TEST_TYPE} coverage: '/Coverage: \d+.\d+%/' artifacts: reports: junit: test-results.xml coverage_report: coverage_format: cobertura path: coverage/cobertura-coverage.xml
Testing layers:
Unit tests: Fast, isolated, run on every commit Integration tests: Test component interactions End-to-end tests: Validate user workflows Performance tests: Check for regressions 3. Security Stage security: stage: security parallel: matrix: - SCAN_TYPE: [sast, dependency, secrets] script: - ./security-scan.sh ${SCAN_TYPE} allow_failure: false
Security scanning types:
SAST: Static Application Security Testing DAST: Dynamic Application Security Testing Dependency scanning: Check for vulnerable packages Secret detection: Find leaked credentials Container scanning: Analyze Docker images 4. Deploy Stage deploy:staging: stage: deploy environment: name: staging url: https://staging.example.com script: - ./deploy.sh staging rules: - if: $CI_COMMIT_BRANCH == "develop"
deploy:production: stage: deploy environment: name: production url: https://example.com script: - ./deploy.sh production rules: - if: $CI_COMMIT_BRANCH == "main" when: manual
Deployment Strategies Blue-Green Deployment
Maintain two identical environments:
deploy:blue-green: script: - ./deploy-to-inactive.sh - ./run-smoke-tests.sh - ./switch-traffic.sh - ./cleanup-old-environment.sh
Benefits:
Zero-downtime deployments Easy rollback by switching traffic back Full testing in production-like environment Canary Deployment
Gradually roll out to subset of users:
deploy:canary: script: - ./deploy-canary.sh --percentage=5 - ./monitor-metrics.sh --duration=30m - ./deploy-canary.sh --percentage=25 - ./monitor-metrics.sh --duration=30m - ./deploy-canary.sh --percentage=100
Canary stages:
Deploy to 5% of traffic Monitor error rates and latency Gradually increase if metrics are healthy Full rollout or rollback based on data Rolling Deployment
Update instances incrementally:
deploy:rolling: script: - kubectl rollout restart deployment/app - kubectl rollout status deployment/app --timeout=5m
Configuration:
Set maxUnavailable and maxSurge Health checks determine rollout pace Automatic rollback on failure Feature Flags
Decouple deployment from release:
// Feature flag implementation
if (featureFlags.isEnabled('new-checkout')) {
return
Benefits:
Deploy disabled features to production Gradual feature rollout A/B testing capabilities Quick feature disable without deployment Environment Management Environment Hierarchy Development -> Testing -> Staging -> Production
Each environment should:
Mirror production as closely as possible Have isolated data and secrets Use infrastructure as code Environment Variables variables: # Global variables APP_NAME: my-app
Environment-specific
.staging: variables: ENV: staging API_URL: https://api.staging.example.com
.production: variables: ENV: production API_URL: https://api.example.com
Best practices:
Never hardcode secrets Use secret management (Vault, AWS Secrets Manager) Separate configuration from code Document all required variables Infrastructure as Code
Terraform example
resource "aws_ecs_service" "app" { name = var.app_name cluster = aws_ecs_cluster.main.id task_definition = aws_ecs_task_definition.app.arn desired_count = var.environment == "production" ? 3 : 1
deployment_configuration { maximum_percent = 200 minimum_healthy_percent = 100 } }
Testing Strategies Test Pyramid /\ / \ E2E Tests (Few) /----\ / \ Integration Tests (Some) /--------\ / \ Unit Tests (Many)
Test Parallelization test: parallel: 4 script: - npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
Test Data Management Use fixtures for consistent test data Reset database state between tests Use factories for dynamic test data Avoid production data in tests Flaky Test Handling test: retry: max: 2 when: - runner_system_failure - stuck_or_timeout_failure
Strategies:
Quarantine flaky tests Add retry logic for known issues Investigate and fix root causes Track flaky test metrics Monitoring and Observability Pipeline Metrics
Track these metrics:
Lead time: Commit to production duration Deployment frequency: How often you deploy Change failure rate: Percentage of failed deployments Mean time to recovery: Time to fix failures Health Checks deploy: script: - ./deploy.sh - ./wait-for-healthy.sh --timeout=300 - ./run-smoke-tests.sh
Implement:
Readiness probes Liveness probes Startup probes Smoke tests post-deployment Alerting notify:failure: stage: notify script: - ./send-alert.sh --channel=deployments --status=failed when: on_failure
notify:success: stage: notify script: - ./send-notification.sh --channel=deployments --status=success when: on_success
Security in CI/CD Secrets Management
Use CI/CD secret variables
deploy: script: - echo "$DEPLOY_KEY" | base64 -d > deploy_key - chmod 600 deploy_key - ./deploy.sh after_script: - rm -f deploy_key
Best practices:
Rotate secrets regularly Use short-lived credentials Audit secret access Never log secrets Pipeline Security
Restrict who can run production deploys
deploy:production: rules: - if: $CI_COMMIT_BRANCH == "main" when: manual allow_failure: false environment: name: production deployment_tier: production
Controls:
Branch protection rules Required approvals Audit logging Signed commits Dependency Security dependency_check: script: - npm audit --audit-level=high - ./check-licenses.sh allow_failure: false
Optimization Techniques Caching cache: key: files: - package-lock.json paths: - node_modules/ policy: pull-push
Cache strategies:
Cache dependencies between runs Use content-based cache keys Separate cache per branch Clean stale caches periodically Parallelization stages: - build - test - deploy
Run tests in parallel
test:unit: stage: test script: npm run test:unit
test:integration: stage: test script: npm run test:integration
test:e2e: stage: test script: npm run test:e2e
Artifact Management build: artifacts: paths: - dist/ expire_in: 1 week when: on_success
Best practices:
Set appropriate expiration Only store necessary artifacts Use artifact compression Clean up old artifacts Rollback Strategies Automatic Rollback deploy: script: - ./deploy.sh - ./health-check.sh || ./rollback.sh
Manual Rollback rollback: stage: deploy when: manual script: - ./get-previous-version.sh - ./deploy.sh --version=$PREVIOUS_VERSION
Database Rollbacks Use reversible migrations Test rollback procedures Consider data compatibility Have backup restoration process Documentation Pipeline Documentation
Document in your repository:
Pipeline stages and their purpose Required environment variables Deployment procedures Troubleshooting guides Rollback procedures Runbooks
Create runbooks for:
Deployment failures Rollback procedures Environment setup Incident response Continuous Improvement Metrics to Track Build success rate Average build time Test coverage trends Deployment frequency Incident frequency Regular Reviews Weekly pipeline performance review Monthly security assessment Quarterly process improvement Annual tooling evaluation