editione1.0.3Updated March 23, 2023
You’re reading an excerpt of The Holloway Guide to Remote Work, a book by Katie Wilde, Juan Pablo Buriticá, and over 50 other contributors. It is the most comprehensive resource on building, managing, and adapting to working with distributed teams. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, 800 links and references, a library of tools for remote-friendly work, commentary and future updates, and a high-quality PDF download.
“If everything is important, then nothing is.” ―Patrick Lencioni, author, The Five Dysfunctions of a Team*
Before teams learn how to react to incidents and emergencies, they first need a definition of what counts as urgent. Teams should have a clear understanding of what warrants shifting out of the default way of working, and how and when they should raise the issue.
By having clear guidelines regarding urgent matters, teams and individuals can also learn how to protect their own focus—they understand that unless something is classified as highly urgent, it can wait. You can facilitate asynchronous work by making sure everyone is able to defer non-urgent items. This allows individuals to devote all their energy to the task at hand without refreshing their email or checking the work chat every 20 minutes to see if there’s something important that needs solving.
caution If there’s no definition of critical systems or operations, then anything that people with authority ask for becomes highly urgent. Bosses aren’t always conscious of their power. They may report a website bug on a channel assuming it will be prioritized and eventually solved, when instead the team interprets this as extremely important and drops everything to make the change. It’s critical for teams to have explicit permissions to defer solving problems reported by people with authority, or to have a framework to ask if there’s a need to reprioritize.
An incident priority matrix is a document that outlines how to gauge the priority of an incident to determine if it should be classified as an emergency or not.
Urgency is contextual to your organization, but there are a few examples of how you can think of urgency in your business. You likely don’t need to wake up your on-call engineer at 3am because your blog went down; it can probably wait a few hours. If you are a transportation company and your dispatching system went offline, that would be a different story. Michael Churchman wrote a simple guide that you can use to build an incident priority matrix. Urgency comes down to impact and the context of your operations.
Impact is generally based on the scope of an incident’s effects—how many departments, users, or key services are affected. A large number of near-simultaneous reports that a specific service is unavailable, for example, may be a good indication of a high-impact incident; while a report of a problem from a single user, unaccompanied by any similar reports, is more likely to indicate a low-impact incident. For many IT departments, the guidelines for determining incident impact might look something like this:
High impact. A critical system is down.
One or more departments are affected.
A significant number of staff members are not able to perform their functions.
The incident affects a large number of customers.
The incident has the potential for major financial loss or damage to the organization’s reputation.
Other criteria, depending on the function of the organization and the affected systems, could include such things as threats to public safety, potential loss of life, or major property damage.
Moderate impact. Some staff members or customers are affected.
None of the services lost are critical.
Financial loss and damage to the organization’s reputation are possible, but limited in scope.
There is no threat to life, public safety, or physical property.
Low impact. Only a small number of users are affected.
It is not always easy to draw a strict distinction between incident impact and incident urgency, but for the most part, urgency in this context can be defined as how quickly a problem will begin to have an effect on the system or people who rely on it. The failure of a payroll system may have a high impact, for example, but if it occurs at the beginning of a pay cycle, it is likely to be less urgent than the loss of a customer-relations database that is put to heavy use on a daily basis.
High urgency. A service that is critical for day-to-day operations is unavailable.
The incident’s sphere of impact is expanding rapidly, or quick action may make it possible to limit its scope.
Time-sensitive work or customer actions are affected.
The incident affects high-status individuals or organizations (for example, upper management or major clients).
Low urgency. Affected services are optional and used infrequently.
The effects of the incident appear to be stable.
Important or time-sensitive work is not affected.
importantNote that for both impact and urgency, meeting a single criterion (rather than all or a majority of criteria) in a category is generally sufficient. Best practice is to place incidents in the highest category for which they qualify.
The best way to build these is to outline what the organization values and why, and then build a brief guide that makes it easy for people to understand impact and urgency. Dealing with an emergency is not the right time for anyone to have to go over a multi-page manual to make a decision.
After defining what urgency looks like, you can focus on how to deal with it.
The first step in handling an emergency is being informed about it, and since we’ve built the “homeostasis” of our distributed work practices around asynchronous communication channels, this means we have to come out of being “on track” and signal to our organization that the attention is needed somewhere. Here, interrupting is not only OK; it’s the right thing to do.
This doesn’t mean any interruption is welcome. Before we are in a state of emergency, we should also outline clear protocols so we can all understand as a team that what is happening is serious and needs to be looked at. We should explicitly define what this looks like.