shipping-and-launch

安装量: 2.8K
排名: #1967

安装

npx skills add https://github.com/addyosmani/agent-skills --skill shipping-and-launch

Shipping and Launch Overview Ship with confidence. The goal is not just to deploy — it's to deploy safely, with monitoring in place, a rollback plan ready, and a clear understanding of what success looks like. Every launch should be reversible, observable, and incremental. When to Use Deploying a feature to production for the first time Releasing a significant change to users Migrating data or infrastructure Opening a beta or early access program Any deployment that carries risk (all of them) The Pre-Launch Checklist Code Quality All tests pass (unit, integration, e2e) Build succeeds with no warnings Lint and type checking pass Code reviewed and approved No TODO comments that should be resolved before launch No console.log debugging statements in production code Error handling covers expected failure modes Security No secrets in code or version control npm audit shows no critical or high vulnerabilities Input validation on all user-facing endpoints Authentication and authorization checks in place Security headers configured (CSP, HSTS, etc.) Rate limiting on authentication endpoints CORS configured to specific origins (not wildcard) Performance Core Web Vitals within "Good" thresholds No N+1 queries in critical paths Images optimized (compression, responsive sizes, lazy loading) Bundle size within budget Database queries have appropriate indexes Caching configured for static assets and repeated queries Accessibility Keyboard navigation works for all interactive elements Screen reader can convey page content and structure Color contrast meets WCAG 2.1 AA (4.5:1 for text) Focus management correct for modals and dynamic content Error messages are descriptive and associated with form fields No accessibility warnings in axe-core or Lighthouse Infrastructure Environment variables set in production Database migrations applied (or ready to apply) DNS and SSL configured CDN configured for static assets Logging and error reporting configured Health check endpoint exists and responds Documentation README updated with any new setup requirements API documentation current ADRs written for any architectural decisions Changelog updated User-facing documentation updated (if applicable) Feature Flag Strategy Ship behind feature flags to decouple deployment from release: // Feature flag check const flags = await getFeatureFlags ( userId ) ; if ( flags . taskSharing ) { // New feature: task sharing return < TaskSharingPanel task = { task } /

; } // Default: existing behavior return null ; Feature flag lifecycle: 1. DEPLOY with flag OFF → Code is in production but inactive 2. ENABLE for team/beta → Internal testing in production environment 3. GRADUAL ROLLOUT → 5% → 25% → 50% → 100% of users 4. MONITOR at each stage → Watch error rates, performance, user feedback 5. CLEAN UP → Remove flag and dead code path after full rollout Rules: Every feature flag has an owner and an expiration date Clean up flags within 2 weeks of full rollout Don't nest feature flags (creates exponential combinations) Test both flag states (on and off) in CI Staged Rollout The Rollout Sequence 1. DEPLOY to staging └── Full test suite in staging environment └── Manual smoke test of critical flows 2. DEPLOY to production (feature flag OFF) └── Verify deployment succeeded (health check) └── Check error monitoring (no new errors) 3. ENABLE for team (flag ON for internal users) └── Team uses the feature in production └── 24-hour monitoring window 4. CANARY rollout (flag ON for 5% of users) └── Monitor error rates, latency, user behavior └── Compare metrics: canary vs. baseline └── 24-48 hour monitoring window └── Advance only if all thresholds pass (see table below) 5. GRADUAL increase (25% -> 50% -> 100%) └── Same monitoring at each step └── Ability to roll back to previous percentage at any point 6. FULL rollout (flag ON for all users) └── Monitor for 1 week └── Clean up feature flag Rollout Decision Thresholds Use these thresholds to decide whether to advance, hold, or roll back at each stage: Metric Advance (green) Hold and investigate (yellow) Roll back (red) Error rate Within 10% of baseline 10-100% above baseline 2x baseline P95 latency Within 20% of baseline 20-50% above baseline 50% above baseline Client JS errors No new error types New errors at <0.1% of sessions New errors at >0.1% of sessions Business metrics Neutral or positive Decline <5% (may be noise) Decline >5% When to Roll Back Roll back immediately if: Error rate increases by more than 2x baseline P95 latency increases by more than 50% User-reported issues spike Data integrity issues detected Security vulnerability discovered Monitoring and Observability What to Monitor Application metrics: ├── Error rate (total and by endpoint) ├── Response time (p50, p95, p99) ├── Request volume ├── Active users └── Key business metrics (conversion, engagement) Infrastructure metrics: ├── CPU and memory utilization ├── Database connection pool usage ├── Disk space ├── Network latency └── Queue depth (if applicable) Client metrics: ├── Core Web Vitals (LCP, INP, CLS) ├── JavaScript errors ├── API error rates from client perspective └── Page load time Error Reporting // Set up error boundary with reporting class ErrorBoundary extends React . Component { componentDidCatch ( error : Error , info : React . ErrorInfo ) { // Report to error tracking service reportError ( error , { componentStack : info . componentStack , userId : getCurrentUser ( ) ?. id , page : window . location . pathname , } ) ; } render ( ) { if ( this . state . hasError ) { return < ErrorFallback onRetry = { ( ) => this . setState ( { hasError : false } ) } /

; } return this . props . children ; } } // Server-side error reporting app . use ( ( err : Error , req : Request , res : Response , next : NextFunction ) => { reportError ( err , { method : req . method , url : req . url , userId : req . user ?. id , } ) ; // Don't expose internals to users res . status ( 500 ) . json ( { error : { code : 'INTERNAL_ERROR' , message : 'Something went wrong' } , } ) ; } ) ; Post-Launch Verification In the first hour after launch: 1. Check health endpoint returns 200 2. Check error monitoring dashboard (no new error types) 3. Check latency dashboard (no regression) 4. Test the critical user flow manually 5. Verify logs are flowing and readable 6. Confirm rollback mechanism works (dry run if possible) Rollback Strategy Every deployment needs a rollback plan before it happens:

Rollback Plan for [Feature/Release]

Trigger Conditions

Error rate > 2x baseline

P95 latency > [X]ms

User reports of [specific issue]

Rollback Steps 1. Disable feature flag (if applicable) OR 1. Deploy previous version: git revert <commit> && git push 2. Verify rollback: health check, error monitoring 3. Communicate: notify team of rollback

Database Considerations

Migration [X] has a rollback: npx prisma migrate rollback - Data inserted by new feature: [preserved / cleaned up]

Time to Rollback

Feature flag: < 1 minute

Redeploy previous version: < 5 minutes

Database rollback: < 15 minutes See Also For security pre-launch checks, see references/security-checklist.md For performance pre-launch checklist, see references/performance-checklist.md For accessibility verification before launch, see references/accessibility-checklist.md Common Rationalizations Rationalization Reality "It works in staging, it'll work in production" Production has different data, traffic patterns, and edge cases. Monitor after deploy. "We don't need feature flags for this" Every feature benefits from a kill switch. Even "simple" changes can break things. "Monitoring is overhead" Not having monitoring means you discover problems from user complaints instead of dashboards. "We'll add monitoring later" Add it before launch. You can't debug what you can't see. "Rolling back is admitting failure" Rolling back is responsible engineering. Shipping a broken feature is the failure. Red Flags Deploying without a rollback plan No monitoring or error reporting in production Big-bang releases (everything at once, no staging) Feature flags with no expiration or owner No one monitoring the deploy for the first hour Production environment configuration done by memory, not code "It's Friday afternoon, let's ship it" Verification Before deploying: Pre-launch checklist completed (all sections green) Feature flag configured (if applicable) Rollback plan documented Monitoring dashboards set up Team notified of deployment After deploying: Health check returns 200 Error rate is normal Latency is normal Critical user flow works Logs are flowing Rollback tested or verified ready

返回排行榜