How are puppies related to tabletop exercises?
Curated scenarios, recommended practices, and advice for tabletop moderators from Allma users
By Jean Lethuillier
In heat-of-the-moment situations like technical incidents, different teams—both technical and nontechnical—need to collaborate. By running incident management tabletop exercises in your organization, more people will be familiar with the necessary procedures and tools (and less people will have to learn in-the-moment) because they’ve already gone through mock scenarios beforehand. Tabletop exercises build your organization’s muscle for incident response in a safe environment—together.
You may be thinking about running a tabletop exercise and are wondering where to start. Trust us, it’s normal to be nervous, especially if this is your first time being a tabletop moderator. The good news? Many have come before you, and many more will come after you. We’ve interviewed Allma users—this blog post is really written by them!—and put together their curated scenarios, best practices, and general advice. Here’s what they recommend.
1. Make it light—the more enjoyable the exercise is, the better it can function as a team-bonding activity
It all starts with the scenario you choose. We recommend keeping it light to create a sense of psychological safety, so participants feel comfortable declaring the incident, contributing, and getting the right people involved. Here are two scenarios you can use to get your creative juice flowing:
Puppies have invaded your data center! No one knows where they came from, but the impact is widespread. The puppies are chewing on the wires linked to several customer environments. These customers have pinged Customer Support and Customer Success indicating that they cannot access key functionalities in your product. To make things more complicated, these puppies are insanely adorable and the technical teams responsible are getting distracted and forgetting to update stakeholders.
Your business, Spoiled Puppies, specializes in custom puppy toys and treats. The holidays are coming and the marketing team is about to launch the biggest promotion your business has ever run. The engineering team has been preparing for this event the last two weeks, deciding which areas need more capacity and scaling up the infrastructure to handle the load. The promotion email goes out, and after a few hours, your message queue is not processing incoming requests fast enough. This might create a load on your infrastructure and create downtime.
2. Balance it with reality—alerts and artifacts lend gravitas
While keeping your scenario light, keep the details real. Ensure reality is threaded throughout your tabletop exercise by preparing alerts, artifacts, and an incident collaboration tool, all of which are elements of a real incident.
Think through how your teams are alerted before an incident. For example, most teams use PagerDuty or OpsGenie for alerting and surface these alerts in Slack. You can consider triggering a mock alert → i.e. “Escalated: #000: Puppies in the data center to [on-call engineer]”
Preparing artifacts can be tricky, so don’t overthink it. It can be as simple as a graph indicating a spike in error messages. Feel free to use the generic graphs we provided below, or create your own! (We typically grab ours from our monitoring dashboards).
Puppies in the data center
Spoiled Puppies incoming messages
3. Prepare yourself and your participants. Do not overcomplicate it!
We recommend Tabletop Moderators to make a simple checklist. Here is an example to help you prepare:
- Identify and schedule a one hour window for the exercise
- Prepare a scenario, alerts, and artifacts
- Invite at least 5 people – recommend closer to 20 – preferably from different teams, to be a part of the exercise
- If you are using Allma as part of your exercise, you can test out the flow by selecting the “Tabletop” workflow using /allma new in Slack
- Brief your participants – see sample text below:
You are invited to our next incident management tabletop exercise! As we are growing rapidly, we will be holding these exercises regularly with different groups of people. Their purpose is to ensure that everyone—new and current—is familiar with our latest incident management processes and to improve cross team collaboration during these events. Your mock scenario will be provided day-of. Afterwards, we’ll review what went well, what did not, and gather your feedback
To do beforehand
- Review our incident management process [here].
- Our interactions during the exercise will be in Slack. We will be using Allma, a collaboration tool for critical work during this exercise. You can get a feel for the tool beforehand by putting /allma new into Slack and selecting the “Sandbox” workflow.
To do day-of
- Bring your curiosity for learning and be yourself!
Jean Lethuillier has been in customer success for the last 11 years, starting as a consultant and practitioner, before moving into leadership positions. Her goal is to always bring the voice of the customer into every corner of Allma. Prior to Allma, Jean led customer success at Cybereason, kununu, and Fuze.