/incident-response If you see unfamiliar placeholders or need to check which tools are connected, see CONNECTORS.md . Manage an incident from detection through postmortem. Usage /incident-response $ARGUMENTS Modes /incident-response new [description] # Start a new incident /incident-response update [status] # Post a status update /incident-response postmortem # Generate postmortem from incident data If no mode is specified, ask what phase the incident is in. How It Works ┌─────────────────────────────────────────────────────────────────┐ │ INCIDENT RESPONSE │ ├─────────────────────────────────────────────────────────────────┤ │ Phase 1: TRIAGE │ │ ✓ Assess severity (SEV1-4) │ │ ✓ Identify affected systems and users │ │ ✓ Assign roles (IC, comms, responders) │ │ │ │ Phase 2: COMMUNICATE │ │ ✓ Draft internal status update │ │ ✓ Draft customer communication (if needed) │ │ ✓ Set up war room and cadence │ │ │ │ Phase 3: MITIGATE │ │ ✓ Document mitigation steps taken │ │ ✓ Track timeline of events │ │ ✓ Confirm resolution │ │ │ │ Phase 4: POSTMORTEM │ │ ✓ Blameless postmortem document │ │ ✓ Timeline reconstruction │ │ ✓ Root cause analysis (5 whys) │ │ ✓ Action items with owners │ └─────────────────────────────────────────────────────────────────┘ Severity Classification Level Criteria Response Time SEV1 Service down, all users affected Immediate, all-hands SEV2 Major feature degraded, many users affected Within 15 min SEV3 Minor feature issue, some users affected Within 1 hour SEV4 Cosmetic or low-impact issue Next business day Communication Guidance Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is. Output — Status Update

Incident Update: [Title] ** Severity: ** SEV[1-4] | ** Status: ** Investigating | Identified | Monitoring | Resolved ** Impact: ** [Who/what is affected] ** Last Updated: ** [Timestamp]

Current Status [What we know now]

Actions Taken

[Action 1]

[Action 2]

Next Steps

[What's happening next and ETA]

Timeline | Time | Event | |

|

| | [HH:MM] | [Event] | Output — Postmortem

Postmortem: [Incident Title] ** Date: ** [Date] | ** Duration: ** [X hours] | ** Severity: ** SEV[X] ** Authors: ** [Names] | ** Status: ** Draft

Summary [2-3 sentence plain-language summary]

Impact

[Users affected]

[Duration of impact]

[Business impact if quantifiable]

Timeline | Time (UTC) | Event | |

|

| | [HH:MM] | [Event] |

Root Cause [Detailed explanation of what caused the incident]

5 Whys 1. Why did [symptom]? → [Because...] 2. Why did [cause 1]? → [Because...] 3. Why did [cause 2]? → [Because...] 4. Why did [cause 3]? → [Because...] 5. Why did [cause 4]? → [Root cause]

What Went Well

[Things that worked]

What Went Poorly

[Things that didn't work]

|

Lessons Learned [Key takeaways for the team] If Connectors Available If ~~monitoring is connected: Pull alert details and metrics Show graphs of affected metrics If ~~incident management is connected: Create or update incident in PagerDuty/Opsgenie Page on-call responders If ~~chat is connected: Post status updates to incident channel Create war room channel Tips Start writing immediately — Don't wait for complete information. Update as you learn more. Keep updates factual — What we know, what we've done, what's next. No speculation. Postmortems are blameless — Focus on systems and processes, not individuals.

安装