Skip to main content
Build the future with Agentforce at TDX in San Francisco or on Salesforce+ on March 5–6. Register now.

Remediate and Recover from an Incident

Learning Objectives

After completing this unit, you’ll be able to:

  • Describe how to remediate systems.
  • Explain necessary actions to clean up impacted systems.
  • List system recovery processes for restoration of production systems.

Remediate Systems

You’ve triaged the incident and contained the threat. It’s now time to remediate the issue. Just like a plumber may inspect your pipes to find which one is leaking and repair it, an incident responder inspects your IT environment to remediate the issue so that the whole system or network doesn’t flood with threats.

A plumber fixes a leaky pipe.

Remediation happens when the threat is eradicated, and the focus shifts to getting things back to the way they should be. It can involve everything from an enterprise-wide password reset to pulling a network cable and rebuilding an infected box. The military term “clear and hold” is a good analogy for understanding remediation and its importance. A counterinsurgency tactic, clear and hold involves clearing an area of enemies and holding it to prevent those enemies from reoccupying.

For example, if there is a compromised device on a network, a remediation measure would be to identify the malware and clean up the machine. Responders should also work with operations to scan remaining systems for the specific indicators of compromise (IOCs) to determine if they are susceptible to the malware, and to help prevent future issues. 

Remediation is critical because without it you are inadvertently allowing infected devices to make continuous attempts to send information back to an attacker. This is where a well-trained response team can be the difference between a successful remediation or a repeat incident. They ensure vulnerabilities are remediated and compromised accounts have passwords changed or are removed altogether and replaced with other more secure methods of access. 

Clean Up Impacted Systems

Let’s follow along with Summer, an incident responder at a technology company that specializes in internet-related services and products. Summer is working to remediate and clean up the company’s systems after an incident, validate that no residual effects exist from the attack, and bring affected systems back into production.

First, Summer reimages the affected systems, wiping them and rebuilding them from a clean baseline. She also makes sure to remove malicious content related to the attack. 

Next, Summer works to prevent the root cause from reoccurring. She identifies what caused the incident, and in this case, she discovers that a hacker exploited an unpatched system that contained a specific vulnerability. She patches the weakness to make sure no one can exploit it again. 

She also reviews the entire IT computing environment, updates installed software, and works with the vulnerability management team to ensure they patch any other systems or networks containing the same vulnerability as soon as possible. Summer knows it’s important that she collaborates with various teams to ensure these clean-up steps are performed. 

Once all this is done, Summer scans the system for additional malware by using an up-to-date vulnerability, antivirus, and anti-malware software to verify that known weaknesses have been removed. Because vulnerability scanners only detect known weaknesses, it’s crucial that they are updated after the organization detects an indicator of compromise that hasn't been seen before, to include the new signatures. She also recommends to her management that they conduct a penetration test on the affected system or network to find and eradicate any additional vulnerabilities. 

Recover and Restore Systems

Once she has finished cleaning up the system and verified that no threats exist, it’s now time for Summer to recover and restore the systems by bringing them back online to full operation. To do so, she works with the system owner to decide when to restore services, driven by information from the IRT. Before the system goes live, Summer tests and verifies it to ensure it is clean and fully functional.

Once the system is restored, Summer works with the operations team to perform ongoing monitoring after the incident in order to observe operations and check for abnormal behavior. Finally, she documents lessons learned from the incident with the IRT. She documents areas for improvement to prevent, detect, protect, respond to, and recover from similar incidents. She also considers what the system owner can implement on the restored systems to protect them from recurrence of the same incident. 

Now you understand more about how to remediate and recover from an incident. Lastly, let’s turn to how to report on incident response. 

Resources

Share your Trailhead feedback over on Salesforce Help.

We'd love to hear about your experience with Trailhead - you can now access the new feedback form anytime from the Salesforce Help site.

Learn More Continue to Share Feedback