Introduction to the Incident Response Process

5 minutes
From

editione1.0.0

Updated October 9, 2023
Now Available
Security for Everyone

Incident response is a well-established practice in the technology space and there has been a lot written about it. This introduction gives you a high-level overview of how incident response processes work and the typical actions and considerations that are associated with every stage.

The first thing to note is that for the most part, incident response is not linear. An incident response is a triggered process that will loop between a number of stages until all evidence and impact of the incident is resolved.

Figure: The stages of incident response.

The process itself is typically made up of four stages of action:

  1. Identification

  2. Verification

  3. Containment

  4. Remediation

Stage 1: Identification

During the Identification stage, an incident has been identified via one of the identified information sources. This information is passed to a first line responder, who triggers the incident response plan.

Example: Actions to Take During the Identification Phase

TaskOwnerOutput
Initiate logging and timeline. Start the record for the incident. Note the nature and content of information received/identified in the Security Channel.Initial ResponderDocumented audit trail in the security channel
Verification of information Source. Where the information leading to the incident acknowledgment was received from outside the organization, it is important to review the source for credibility, agenda, and risk.Initial ResponderVerification activities and findings noted in the security channel
High-level triage. Before an incident can be confirmed, a basic assessment should be made. This aims to eliminate known false positives and confirm reported or suspected issues. Triage will vary by incident type.Initial Responder

Support
Triage notes in security channel
Initiate incident. Response Create a channel for the incident within Slack. Notify the Security channel of this new channel and ask conversation to be moved to the incident specific space.Incident Responder

Incident Lead
Creation of new incident specific document or communications log.
(Optional) Activate on call. If the incident has occurred outside of normal working hours, the on-call system should be used to contact and activate on-call staff.Incident Responder

Incident Lead
On-call staff available to respond
Allocate roles. Assign incident lead, deputy, and communications lead roles. Notify other named parties with incident responsibilities (see Roles and Responsibilities)Incident LeadList of allocated roles and contact details in the incident security channel.
Classification of the incident. Using the classification guidance in this document, classify the issue. Peer review this decision with another member of the incident response team.Incident LeadClassification of incidents made and documented in the incident security channel.
(High severity or above) Executive briefing. Where an incident is of high severity or highly public in nature, a brief should be given to the executive team. They may have questions or concerns that should be addressed. The communications lead or executive liaison should act as the ongoing mediator with this group.Incident Lead

Comms Lead
A concise executive summary of the incident and its status delivered to the executive team and stored in the incident specific security channel.
Incident response briefing. Initial responder to brief the incident team and answer any initial questions. This makes the end of the active responsibility for the initial responder (unless they have been assigned the lead or deputy role).All Incident TeamMeeting held with the incident team. Team briefed and if appropriate, the initial responder was relieved of duty. Minutes of meeting documented in incident specific security channel
(Optional) Update public status page. If the incident is directly affecting customers or public facing systems, an appropriate update should be made on status page or update channels. External messages should be QA’d by the incident lead and a member of the senior leadership.Comms Lead

Incident Lead
Update to status page mechanisms where appropriate.

Stage 2: Verification

Before the incident is a confirmed issue, the accuracy and extent of the issues must be verified. This stage of incident response is focused on the confirmation of the issue and clarification of the scope or extent to which it affects your company, its systems, and users.

Verification includes the identification of the issue across multiple data sources and the reproduction of any suspicious performance behavior in a controlled manner (by organizational staff or on organizational equipment). Even if the verification process flags this incident as a false alarm or inaccurate, it should still be documented.

Example: Actions to Take During the Verification Phase

TaskOwnerOutput
Identify affected customers and systems. It is crucial that the extent of the incident is understood and recorded. Where appropriate this should include a breakdown of customers affected or systems/hosts at riskIncident Lead

Deputy
List of affected systems or customers in incident specific security channel
Access and monitor all logs for the affected accounts or systems. (Optional) Where relevant or appropriate, increase logging levels to ensure sufficient granularity.DeputyUpdates and findings in incident specific security channel
Establish a timeline of events. Record all findings and investigative paths in the Incident Security Channel.ScribeUpdates and findings in incident specific security channel
Reproduce issue on the non-production environment. For issues that are caused by specific bugs or actions, these must be tested and documented.DeputyUpdates and findings in incident specific security channel
Identify other potential issue areas. Where an issue is caused by a specific bug or action, extend testing to all associated use cases or similar interaction points where possible.DeputyUpdates and findings in incident specific security channel
Investigate root cause or sequence of events leading to incident. Where time allows, ensure that the issue being investigated is the root cause of the issue and not the side effect of another more serious issue. This will require cross log investigation and timeline analysis.DeputyUpdates and findings in incident specific security channel
Confirm issue across account types, geographic location, etc. (the scope of the incident). It is crucial that the full scope or extent of the issue is understood. For platform or system issues that are public facing, this includes running out of privilege and geographic distinctions. Test assumptions and systems from both inside and outside organizational networks to avoid testing environment bias.DeputyUpdates and findings in incident specific security channel

Stage 3: Containment

Once identified and confirmed, the issue should be contained such that its impact on your systems and customers can be limited. Where possible, affected systems should be isolated from healthy systems. This may include preventative account suspension, removal from networks, or password reset activities if an account has been compromised.

Unlock expert knowledge.
Learn in depth. Get instant, lifetime access to the entire book. Plus online resources and future updates.
Now Available

All containment activities should be documented as part of the incident log and implications of said containment communicated to affected stakeholders.

danger Containment steps are very specific to the individual incident and scenario type. The following are generic steps and should be used as a guideline but not a comprehensive and complete approach.

Example: Actions to Take During the Containment Phase

TaskOwnerOutput
Initiate customer contact. Where customers are affected, directly contact each customer. Contact should aim to reassure and acknowledge rather than provide technical detail. Required actions must be well tested inside the organization before external communications are sent.Comms LeadCustomer contact drafts and actual messages
Isolate compromised host(s). Where a host is assumed compromised, remove it from the network wherever possible or lockdown ingress and egress to a single controlled IP. Avoid powering down or restarting the host until an image or snapshot can be made.Incident LeadList of compromised hosts plus results from checking the isolation is successful
Suspend compromised account(s). Where an account has (or is suspected to have) been compromised, it should be suspended. Suspension should aim to preserve all access or event logs for the account. Where the account is central to core operations, this should be reflected in the incident severity and classification. A decision must be made as to whether the account can be suspended safely without disrupting availability.Incident LeadSuspended account list and access to the relevant access and event logs for said accounts
Seize relevant hardware or equipment. Where hardware such as laptops are believed to be the cause of or affected by an incident, they should be taken by the incident team for investigation and eventual remediation. Temporary clean devices may be issued as an interim solution, however, these should provide the minimum to get the job done and be replaced once the incident is resolved.Incident LeadSeized hardware list including asset tag and assigned owner

Stage 4: Remediation

Once contained, the issue must be remediated. This stage may vary in length and complexity based on the incident. If dealing with a security issue or an issue involving complex or legacy systems, consultation with domain experts is strongly recommended.

Changes made during the remediation phase should be undertaken in a controlled and documented manner, ensuring that each change is tested before the next is applied. Chaotic or uncontrolled changes increase the likelihood of introducing additional issues into the system or hiding potentially simple solutions.

Remediation can only be deemed successful once the verification step has been repeated and end-to-end tests have been conducted. For vulnerabilities outside of your company’s control, this might include following security news feeds, running available check tools, and increasing monitoring for the duration of the issue.

Verification, containment, and remediation will continue as a repeating loop until all the issues have been addressed and systems behavior has been returned to normal.

danger Remediation steps are very specific to the individual incident and scenario type. The following are generic steps and should be used as a guideline but not a comprehensive and complete approach. As always, if you are unsure on how to proceed or don’t have the skills in your team, reach out to professionals for help. Companies specializing in incident response and forensics will have the skills and experience you need to respond.

Example: Actions to Take During the Remediation Phase

TaskOwnerOutput
Patching and systems updates. Where applicable apply vendor patches or assess the availability of application or framework updates.Incident LeadList of systems updated, and patches applied in incident specific security channels.
Address privacy issues. If the privacy of any personal data has been compromised, the privacy officer must assess the impact and determine the appropriate action to take in remediation.Privacy OfficerAssessment on whether further action is required.
Address software flaws. Where an incident relates to a vulnerability or issue with an in-house application, ensure that code is fixed and tested before deployment. Ensure that all instances of the flaw or issue are addressed and not just the initial instance. Engage external assistance where appropriate.DeputyChanges to code base linked to specific commits and tests.
Address configuration issues. Where an incident relates to a misconfiguration, ensure that this is addressed in the build systems or scripts and the host is rebuilt with the new configuration. Avoid fixing in place on deployed servers where possible to avoid configuration creep.Incident LeadRebuilt hosts and updated host build files.
Initiate backup recovery. Where data has been lost or compromised, ensure that a backup is available and prepared for restore.Incident LeadEstimated recovery time and recovered data.
Re-image or rebuild equipment or machines. Where equipment has been compromised or affected by an incident, re-image, or rebuild from a trusted base image. Do not attempt to fix individual issues such as malware or viruses in place.Incident LeadRebuild hardware
Address gaps in logging and audit. If the incident highlighted gaps in logs or audit trails, address these and ensure logs are centralized, securely stored and monitored.DeputyLogging and audit for the acknowledged gaps
(Optional) Engage an external specialist to assess and retest remediation. For serious or complex incidents, ensure an objective specialist has reviewed and retested the remediation issues.Incident LeadAssessment results and report
Communicate with affected customers. Once remediation is complete, the affected customers should be briefed. Where the action is required on their part (such as resetting a password) this must be clear and concise. Communication content and a distribution list should be QA’d by the Incident lead and a senior leader before sending.Comms LeadDraft communications, sign off and actual communications
(Optional) Executive brief. For high severity issues, an executive brief should be compiled upon remediation. This should address any concerns and explain the risks and effects of the incident in concise terms.Incident Owner

IT Manager
Executive briefing document

Ongoing Incident Response Actions

Unlike the actions we have discussed above, this last set of suggested tasks are ongoing. They need to be something you do frequently at all stages of the incident response process. The aim here is to ensure you always have a good record of what you have done or discovered and that you are always taking steps to learn more about the situation as it evolves.

This documentation and discovery not only helps with post-incident reviews but makes it much easier to share the load during an incident and let people swap in and out.

Example: Ongoing Incident Responses and Outputs

TaskOwnerOutput
Record all actions, findings, and communications in the log.AllDocumented audit trail
Access and monitor all logs and audit trails for the affected accounts or systems. (Optional) Where relevant or appropriate, increase logging levels to ensure sufficient granularity.AllNone
Identify, document, and challenge all assumptions (ongoing).AllDocumented audit trail

Whatever the incident you face, this process provides a stable and predictable set of activities and actions that you and your team can use to respond. When we put our knowledge of this incident response process into a repeatable document, we form what is known as an incident response plan, your grab-and-go guide to surviving in stressful times.

How to Create an Incident Response Plan

There are many ways to document these plans—stick with what works for your internal culture and documentation style. Rather than define the document template, we will look at the sections you need to include and why they are important.

Section 1: Defining Incident Severity and Classification Levels

Like many of the subjects we have discussed in this book, just because something is an incident, it doesn’t mean the world is ending. Security isn’t always critical and that’s OK.

You’re reading a preview of an online book. Buy it now for lifetime access to expert knowledge, including future updates.
If you found this post worthwhile, please share!