Skip to main content

Understand Critical Incidents

Learning Objectives

After completing this unit, you’ll be able to:

  • Define what qualifies an incident as critical.
  • Explain the differences between the categories, severities, and types of critical incidents at Salesforce.
  • Describe what happens during an incident response.
  • Name the U.S. government federal agency that publishes the U.S. National Incident Management System.

Let’s Get Started

In this module, we introduce you to the world of major incidents, or critical incidents, as we call them here at Salesforce. We teach you how to prepare for them and stay informed if you experience a Salesforce critical incident. And we introduce you to how we use our own Salesforce suite of products to build an industry-leading incident management solution.

Cloud Service Industry Incidents

Disruption is unavoidable in today’s digital world

Service-impacting events are inevitable. Digital transformation has enabled companies to innovate, create new service offerings, and grow their businesses. Research firm Gartner reported that worldwide end-user spending on public cloud services is forecast to grow 20.7 percent to total $591.8 billion in 2023, up from $490.3 billion in 2022. But delivering these services has also introduced a new challenge–ensuring the availability and uptime of those services.

Critical incidents impacting the cloud computing industry are a major economic risk to businesses globally. The emerging risks report suggests that an incident that takes one of the top-three cloud service providers offline for 3 to 6 days would result in total losses of between $5.3 billion and $19 billion. 

Apart from the direct financial costs of critical incidents, they can also lead to compliance and regulatory penalties, and a company’s brand and reputation getting badly damaged. Not all incidents can be prevented, so when they do happen, your company’s critical incident response can protect your brand by supporting your customers with transparent and timely communications.

Graphic showing a warning triangle.

What Makes an Incident Critical?

At Salesforce, an incident is an issue with our customers’ Salesforce service(s) that impacts their ability to use Salesforce products. It can be caused by anything from a simple misconfiguration on a customer org to an outage at a data center that hosts our cloud platforms.  

A critical incident is different from a typical support case due to its potential customer impact and the urgency with which it needs to be addressed. Critical incidents require an elevated response to avoid possible damage to our customers’ and our reputation and revenue streams.

Types of Incidents

Incidents are categorized by our customers’ experience. For example, is their Salesforce product running slowly? Is part of the functionality not working correctly? Or is it down or inaccessible? These different experiences are broadly categorized as the following. 

  • Service disruption—The product/service is unavailable, or its performance is too slow to be considered usable.
  • Performance degradation—The product/service performance is intermittent or slower than expected but usable.
  • Feature disruption/degradation—A specific feature within a service is slow (or unusable).

Graphic representing one person on the left versus several people on the right to reinforce the difference between an incident affecting one customer or many customers.

Incident or Escalation

Another way of categorizing incidents is by how many customers they affect. At Salesforce, we take a holistic approach to critical incident and escalation management. Since every customer is unique, they’ll have different issues with varying levels of escalation—some simple and some more severe. To us, it doesn't matter if 1 or 1,000 customers are affected, we aim to provide the same level of service. Incidents typically affect more than one customer and occur when a Salesforce cloud or data center that provides multitenant services is impacted. The classification for a critical incident is significant, as the response is different for each one.

Incident Categories

Incidents are distinguished by severity ranging from severity 4 (Sev4) with the least customer impact up to severity 0 (Sev0) with the most customer impact. The severity category of incidents is assigned by Salesforce engineering and support teams according to predetermined criteria. This categorization of incidents helps us provide customers with a response appropriate to their situation

Escalation Severities (1 customer)

Incident Severities (more than 1 customer)

Critical Incident

Sev1

Sev2, Sev1, Sev0

Yes

Sev4, Sev3, Sev2

Sev4, Sev3

No

Graphic showing several people communicating as part of an incident response.

Salesforce Critical Incident Response

The objective of incident response is to respond predictably, consistently, and quickly to restore services. The Salesforce corporate incident response brings together several teams from across Salesforce to resolve the incident collaboratively. Depending on the incident, the resolution teams may include Engineering, Support, Crisis Management, Security, Communications, Legal, Public Relations, and appropriate executives up to and including our founders, Marc Benioff and Parker Harris.  

Graphic showing a representation of a process.

Incident Response Playbook

Salesforce uses the Corporate Incident Response Playbook to respond with specific plays to restore service quickly, ensuring customers’ impact is minimized. The Playbook defines who should do what and when during an incident to ensure the following. 

  • Response to incidents is predictable, consistent, and fast.
  • All resources responding to the incident are fully trained in their role and responsibilities.
  • Internal stakeholders are called to action immediately and have a clear view of progress and status.
  • Companywide resources (people, tools, information, and so forth) are available to protect customers’ trust in Salesforce.

Incident Management System (IMS)

When creating the guidelines for our incident response, we chose to root our processes in crisis management principles first and technology management principles second. We worked with consultants in this field to develop our specific response playbook. However, much of this information is publicly available, so you can build your guidelines similarly. Check out the National Incident Management System (NIMS) operational guide from the Federal Emergency Management Agency (FEMA), which you can download on fema.gov (link in Resources section).

This rigor allied to training, retrospective analysis, and continuous process improvement, lets us execute tasks during any kind of incident, meaning no surprises and no learning curve. The only difference is the kind of incident. And everyone already understands their role and responsibilities because we use them daily to execute incident-response.  

Now that you know what critical incidents are, move onto the next unit to learn about the team that manages them inside Salesforce.

Resources

Share your Trailhead feedback over on Salesforce Help.

We'd love to hear about your experience with Trailhead - you can now access the new feedback form anytime from the Salesforce Help site.

Learn More Continue to Share Feedback