Respond to Alerts and Recover from Incidents
After completing this unit, you’ll be able to:
- Describe how to respond to and triage security alerts and alarms.
- Explain the importance of remediating operational gaps and updating policies, procedures, and controls.
Respond to Security Threats and Incidents
Let’s meet Chloe, who is a security operations engineer at a consulting firm. Like many of us, Chloe thinks about global cybercrime often as she uses technology in her personal life and at work. Chloe read recently that the first six months of 2019 alone saw more than 3,800 publicly disclosed breaches, exposing an incredible 4.1 billion compromised records from companies around the globe. That’s enough to keep anyone awake at night.
Chloe knows that both the velocity and volume of attacks is increasing. As a security operations engineer, it’s her job to protect her company from cyber crimes such as hacked devices, crashed websites, and breached networks. She knows that she can’t achieve total security, but that she has a crucial role to play in defending, monitoring, and responding to security alerts.
While cybersecurity incidents are inevitable, Chloe’s goal is to quickly detect and contain further damage from occurring and affecting more systems when a breach does occur.
Chloe uses the company’s security information and event management (SIEM) software to analyze a high volume of automated alerts of possible threat activity. The alerts provide her timely information about possible security issues, vulnerabilities, and exploits. Her first task is to verify if an alert is legitimate, and then prioritize alerts based on their severity.
As Chloe undertakes this preliminary investigation, she links information together, uses scoring mechanisms to understand the severity of alerts, and tries to identify suspicious events. She triages alerts, prioritizing issues and analyzing them to confirm a real security incident is taking place.
When she identifies a security incident, Chloe springs into action to resolve it, using the SIEM to provide additional context to respond to the event. For example, she investigates which other systems were accessed by the same IP address or user credentials that could indicate that an attacker is moving laterally through the network to compromise other systems.
In responding to alerts, Chloe works with the incident responders in the security operations center (SOC) to implement controls and processes for security events. The map she and the team follow is the incident response plan, which helps them effectively identify the threat, minimize the damage, and reduce the impact of a cyberattack. Once a security incident has been identified, Chloe and her coworkers race to gather more data, identify the source of the attack, contain it, and recover data and restore system operations to normal.
There are a variety of responses she may need to take including taking a system off the network, powering down the system, or isolating traffic to and from the system in an effort to contain the threat. She also works to forensically back up the system involved in the incident so that any data altered or destroyed can be recovered from a trustworthy backup, and so that she can preserve digital evidence of the attacker’s moves. As she does so, she communicates closely with the business owner of the system, since any further impact to the availability of the system can have a negative impact on the organization.
Remediate Operational Gaps and Recover from Incidents
Once Chloe has contained the incident, she then works with her team to identify broader security gaps related to the attack and plan mitigation steps to prevent additional attacks. Chloe performs root cause analysis, to understand the source of the problem and diagnose why it occurred in order to prevent future attacks.
She assists the business owner of the affected asset or systems to restore them reliably, and takes further steps to harden them against such future attacks. She remediates vulnerabilities and misconfigurations that caused the incident to occur, and assesses other systems in the IT environment for similar vulnerabilities and misconfigurations. She makes sure she understands the attack and associated mitigation steps, and gathers additional forensic data to document the attacker’s actions.
Once the cause is known and the team is confident they have remediated all vulnerabilities and misconfigurations, the system is returned to a stable state. Chloe may have to work with system owners to rebuild the system from scratch, or may be able to restore it from a known backup that has not been compromised. Chloe works closely with a variety of stakeholders during this process, including the business unit, customers, legal teams, law enforcement, and regulatory bodies to update them on the incident and associated remediation.
Following the system restoration, Chloe works with teams across the technology organization to review policies, procedures, and controls to ensure any necessary updates are made. She works with system owners to implement patches and check configurations on systems across the IT environment to further harden systems. Chloe knows that this isn’t a once-and-done activity: Remediation and mitigation are ongoing efforts. She will continue to work with the organization’s audit team to perform periodic reviews of the security processes and threat research in order to mature organizational processes. She also helps the company ensure that it is complying with applicable laws, regulations, frameworks, policies, and procedures, both on an ongoing basis and following an incident or breach.
Finally, Chloe summarizes the results of this remediation and shares the lessons learned with relevant stakeholders. Her goal is to bring positive changes to the overall security of the organization. She suggests improvements to avoid such incidents in the future. She asks herself, How could the risk have been identified sooner? How could the team have responded quicker? What other improvements should be made to this and other systems? She knows that an incident is always a chance to think about how things could be done better in the future.
Ready to review what you’ve learned? The knowledge check below isn’t scored—it’s just an easy way to quiz yourself. To get started, drag the description in the left column next to the matching category on the right. When you finish matching all the items, click Submit to check your work. If you’d like to start over, click Restart.
Sum It Up
Great work! In this module, you’ve been introduced to how you as a security operations engineer identify critical assets, associated vulnerabilities, and relevant threats. You’ve learned more about how to harden systems against attack, and other security services that SOCs provide to the business. You’ve also discovered the crucial role you play in detecting anomalous activity and responding to and recovering from incidents. Along with the information you reviewed in the first module, Security Operations, you should now have a better understanding of what it takes to be a security operations engineer. You can learn more about the in-demand cybersecurity skills necessary to get a job in security operations engineering, or another field, and learn more about security practitioners by visiting the Cybersecurity Learning Hub on Trailhead.
- External Link: National Institute of Standards and Technology (NIST): Computer Incident Handling Guide
- External Link: NIST: Guide for Cybersecurity Event Recovery
- External Link: NIST Special Publication 800-53 (Rev. 4) Security and Privacy Controls for Federal Information Systems and Organizations: SI-5 Security Alerts, Advisories and Directives