Scott Klein's Technical Incident Management Workflow
The founder of Statuspage.io offers his guide to managing technical incidents—with an emphasis on establishing your values and frequent communication
By Scott Klein
This workflow is focused on Scott Klein's recommendations for status communications externally using Statuspage.io and has a focus on external and internal communications during incidents. In the Allma app, it will replace the Allma technical incident management workflow.
- Define your incident values [See Scott’s values template]
- Define your incident workflow
- Prepare an incident declaration guide
- Define severity levels
- Define roles—and figure out who on your team feels most comfortable doing what under pressure
- Choose appropriate tooling
- Prepare communications templates [See Scott’s communications template]
During the incident
- Identify: Declare the incident. Reference your incident declaration guide, incident values, and overall workflow for guidance.
- Assign roles.
- Indicate severity level, defining known impact on organization and its users.
- Communicate both internally and externally at a cadence appropriate to the severity level and overall impact.
- Mitigate: Take actions to mitigate the incident, communicating throughout.
- Monitor: Monitor your implemented fix while continuing to communicate
- Resolve: The fix worked! Resolve the incident, communicate both internally and externally, and schedule a post-incident retrospective
In a retrospective, identify underlying behaviors that caused the problem, determine action items to reduce probability of recurrence. Communicate internally and externally as needed.