Assist with Incident Response and Recovery

Learning Objectives

After completing this unit, you’ll be able to:

Explain how to respond to and remediate cloud-based attacks.
Describe how to repair and recover from incidents and technology failures.

Respond to and Remediate Cloud-Based Attacks

Maintaining a safe and secure environment for sensitive data is a cloud security engineer’s highest priority. Incident response is a key aspect of an overall cloud security program. A data incident is defined as a breach of your cloud system’s security, leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to sensitive data on cloud systems you manage or control.

An engineer responding to an alert in a cloud system

Whatever the size of your organization, your job as a cloud security engineer involves working with a trained incident response team tasked with taking immediate action, and helping respond to detected incidents in the cloud environment.

Incident response is a term used to describe the process by which an organization handles a data breach or cyberattack, including the way it attempts to manage the consequences of it. The goal of incident response is to protect customers’ data and restore the system to normal. The first step you take after a security incident occurs is assembling your team. You work with your internal incident response team, as well as trusted partners who may be involved in the investigation or response, and provide additional expertise and valuable scrutiny. Your incident response team may include specialists in cloud incident management, cloud security and privacy, and product engineering, among others.

Next, you detect and ascertain the source of the incident. The focus of this phase is to monitor security events to detect, alert, and report on potential security incidents. When an incident has been detected, you perform triage and analysis to determine what has occured. You work with the incident response team to analyze network traffic and system access to identify suspicious activity. You scan for security tests using penetration tests, intrusion detection, and software security reviews. You review source code to discover hidden vulnerability and design flaws and verify if security controls are implemented.

Once you’ve ascertained the source, you work with the incident response team and system owners to contain, neutralize, and recover from the incident and remediate affected systems. You assess the damage and severity, and notify affected teams and customers. You analyze other systems and parts of the network for further evidence of malicious activity. Finally, you take steps to help prevent the same event from happening again. Lessons learned from the incident help facilitate prioritization of future engineering efforts.

Repair and Recover from Incidents and Technology Failures

Even though all incidents cannot be prevented, with the cloud, proactively detecting, reacting, and recovering can be easier. This is because your CSP likely provides you with security visibility and automation controls that enable you to be more proactive rather than reactive. Part of your role as a cloud security engineer is to set up your cloud environments in such a way to mitigate a security event faster, cheaper, and more effectively.

You’ve already learned a lot about how to be proactive by establishing strong cloud security governance, putting controls in place to mitigate threats and vulnerabilities, and taking detection seriously to provide visibility and transparency into the operation of your cloud deployments. You play a role in planning for incident response to help recovery go smoother. You ensure all cloud users in your organization have a basic understanding of security incident response processes and how to react to security issues. You also simulate both expected and unexpected security events within your cloud environment to understand the effectiveness of your preparation.

Note that creating an appropriate incident response and forensics plan that matches your operating model is extremely important. In the cloud, you may be responsible for managing customer data, platforms, applications, identity and access management systems, operating systems, networks, and firewall configurations, among other items. Meanwhile, you may be using a cloud infrastructure service provider that manages some of the software and hardware underlying your cloud applications. Understanding which party has responsibility for managing each piece of this infrastructure is key to both properly managing security controls and responding to and recovering from incidents or technology failures successfully.

Recovering From An Incident and Automating Responses

When an incident or technology failure does occur, you repair and recover by coordinating and communicating with impacted teams. You work with your stakeholders, legal counsel, and organizational leadership to determine the goal of responding to an incident (such as mitigating the issue, recovering the affected resources, or preserving data for forensics and attribution). You implement responsive controls to drive remediation of potential deviations from your security baselines.

You know what you have and what you need to successfully recover, and you preserve logs, snapshots, and other evidence by copying them to a centralized cloud account. When you see issues or incidents repeat, you build automated mechanisms where possible to respond to common situations. Some examples of what you can use automated mechanisms for include:

Ensuring permissions are configured in exactly the same way every time
Making sure cloud storage buckets are not publicly readable or writable
Monitoring any changes in permissions for individual users, entire roles, or databases

When you identify gaps in your processes, tools, or people, you take the time to plan to fix them, and conduct additional simulations to find gaps and improve processes.

One of the most important things you can do to prevent, detect, respond, and recover from incidents is to create a receptive and adaptive security culture. This means fostering a culture wherein security teams are viewed as both enabling and securing the business and developers, and where all stakeholders feel shared ownership for maintaining secure cloud systems.

From the leadership team down, it’s important to promote a culture of acceptance and invite everyone to be a part of the organization’s security, including providing a clear channel for anyone to open a high-severity ticket whenever they believe there’s a potential risk or threat. Welcome these notifications with an eager and open mind, and more importantly, make it clear to non security staff that you welcome these notifications. A successful incident response program requires constant vigilance, modifications, and reevaluation.

Sum It Up

In this module, you’ve been introduced to methods for assessing your existing infrastructure, identifying threats, and adopting cloud computing. You’ve learned more about how to build and operationalize cloud technologies, monitor cloud systems and detect threats, and assist with incident response and recovery.

Along with the information you reviewed in the Cloud Security Engineering module, you should now have a better understanding of what it takes to be a cloud security engineer. You can learn more about security practitioners and the in-demand cybersecurity skills needed to get a job in cloud security engineering or another cybersecurity role by visiting the Cybersecurity Learning Hub on Trailhead.

Assist with Incident Response and Recovery

Learning Objectives

Respond to and Remediate Cloud-Based Attacks

Repair and Recover from Incidents and Technology Failures

Sum It Up

Resources