Systems thinking & CTI: Scenario-Based Incident Response Playbooks
This article provides lessons learned to help teams cut approximately 4 hours off of the initial response to a security incident. It is for threat & risk teams supporting incident response functions.
Alternatively, the contents help you ask managed service providers or vendors specific questions about how they would support you in scenario-based incident response.
Cheers!
Why is this even new?
Incident response playbooks describe what you do if a given situation occurs. They provide guidance and predetermined lists of actions so that you don’t have to scramble when an incident occurs.
Scenario-based incident response playbooks emphasize deeper integration between the threat model and the exact playbook. For example, there are interlinkages between different functions, using a common taxonomy or data model.
Mapping out how exactly to respond to a given situation predates modern technology. What has changed over time is the effectiveness of sharing & collaboration. Various templates and tips from incident response playbooks are readily available; just Google for ‘cyber incident response playbook’ and you will be hit with 1000’s of pages of content.
The real question is why do teams still struggle with an initial response to certain events?
Feedback we've gathered from colleagues, clients, and other practitioners suggests that the biggest struggles are::
The most often mentioned reason for these struggles: there’s simply too little priority placed on keeping the versions current. This is just one of those items that is not considered important enough or urgent enough for teams to focus on directly except when it’s time to use it.
To help chip away at this problem, I worked with my team at Venation to understand which areas need attention and which don’t.
Full disclosure, we provide commercial offerings to mitigate these struggles but I still felt that we should provide folks the means to first evaluate this themselves.
This article contains key takeaways so that teams can do an honest self-assessment of their status.
Stakeholder integration & communication flow
First, how do team A and team B communicate? And if they communicate, what do they communicate?
Based on my personal experience, before we start thinking about playbooks we should be thinking about people. Who communicates with whom at specific points in the incident management process. Or, if your organization does not have an incident response process, what is your ‘who do I call’ list.
In complex environments I regularly support cyber threat intelligence (CTI) teams that leverage the concepts of stakeholders. This logic is not just for CTI teams, it’s for entire organisations and a crucial mechanism to understand who works together.
I often recommend starting with a Stakeholder Matrix and the reason is that I find that a lot of teams don’t use it. This is especially true for incident response teams, where the priority is on the daily grind.
Communication flow
I found that it is extremely practical to make it crystal clear what the exact flow of communication is.
When I say crystal clear, I mean going above and beyond, to make sure that no implicit aspects exist. When you review these playbooks long enough the trend you will see is that it is highly-likely they are created based on internal knowledge.
The key questions to ask to understand the current status of communication flow are:
See an example of how you can think about a data flow below:
One key lesson learned is setting up relationships with law enforcement, especially the communication flow to and from law enforcement. This might sound like overdoing it, but knowing who to call in this case can be very beneficial in having a timely response. In addition, in some cases it might be needed due to compliance or legislative reasons.
For example, in the Netherlands, there is a comprehensive network with local and national law enforcement agencies. Not to mention government agencies such as the Nationaal Cyber Security Centrum (NCSC-NL) and the Digital Trust Center , who provide guidance in these areas.
Key information needed before or during an incident
Once the communication flow is sorted out, then you explore exactly what information is being shared between teams. Exactly what this should be is something I cannot document here. This is unique to each company and the secret sauce of each team.
I do want to share two observations: structure & level of detail
Playbook Structure
The most effective structures we currently found when building incident response playbooks include the following:
Level of detail & the scenario-based approach
Once all elements are included, you need to decide with your teams what the right level of detail is for the playbook. Are you already at the right level of detail or do you need to go one level deeper?
This is where the scenario-based approach is providing peak value: when your team operates with a common set of threat or risk scenarios, you can easily use this as a template to start from. Going deeper based on the exact demand of the Incident Response function.
Here’s an example of one of my teams scenarios:
Your CTI function should be equipped to develop similar scenarios, which in turn changes discussions about level of detail into ‘this is the starting point, where exactly do you need more details and what is that level?’. Much more specific and actionable.
I strongly believe this contributes to the actionable and timely nature of CTI functions. In addition, it just saves you loads of time for everything from the workflow to the exact investigative steps.
Few other lessons:
Operational Security
I cannot end this segment without talking about operational security. Having worked directly with red teams most of my career, I found that crucial action intelligence is often derived from having access to the current playbooks from the security team. This same lesson applies to your adversaries.
You have to make sure access to these materials is need-to-know and that this access is periodically reviewed. If you make it available in a wider security function, then make sure that you are aware of what information is being disclosed in case something happens.
This is often not considered nor included in daily operations.
Yes. You need to have a process.
Most likely your teams already have a process. Good! Is it working?
Recommended by LinkedIn
If you don’t, then consider that making up the process as you go along while dealing with the stress of an ongoing incident is like reading the car manual while trying to change the brakes on a car heading for a cliff. By having a ready made and approved IR process, your organization avoids the chaos of trying to figure out who is responsible for what, the lines of communication, and a decision making process. You will also need to have a back-up plan in place should everything be compromised, but we will get to that later in the article.
An incident response process is a structured approach to effectively managing and mitigating security incidents typically consisting of five stages and providing a roadmap for handling incidents.
Two widely recognised frameworks for incident response are the NIST (National Institute of Standards and Technology) and SANS (SysAdmin, Audit, Network, Security) frameworks.
This article is not about incident management, but here’s a TLDR primer: During the preparation phase, organisations establish incident response plans, define roles and responsibilities, and implement necessary security measures. The identification stage involves detecting and validating incidents through monitoring systems and alerts. Once an incident is confirmed, containment measures are implemented to prevent further damage. The eradication phase focuses on removing the cause of the incident and restoring systems to a secure state. Finally, the recovery stage involves restoring normal operations, conducting post-incident analysis, and implementing improvements to prevent future incidents.
Maintenance
This is the segment you don’t explicitly see in the aforementioned processes. Especially when talking about Playbooks this might be one of the most important items. A playbook needs to be updated regularly! This means periodic review and signoff by a person with authority. Next this should be exercised and results should be fed back into the playbook.
If your incident management process does not explicitly include updating & maintaining the playbooks, I encourage you to do so.
This is also the key area where you can integrate your CTI function with the Incident Response function. This is a highly tangible integration, as this provides specific Priority Intelligence Requirements for what to focus on. For example, if the Scenario is about Ransomware & Double Extortion, you can task your CTI function to investigate this further and propose updates to the playbooks based on current research.
I specifically recommend your CTI function to contribute into:
Added benefit: you can slap a ‘threat informed’ sticker on them as well!
Workflows
Visualisation plays a crucial role in incident response, providing a clear representation of roles, responsibilities, and activities within the incident playbook. A visual workflow allows stakeholders to quickly grasp the sequence of activities, dependencies, and decision points. This enhances communication, coordination, and overall efficiency during incident response. It might sound silly, but we remain a visually oriented species.
You can simply start by using Microsoft PowerPoint or Visio, but other simple charting solutions like draw.io will also do. Everybody loves a good workflow.
Exercising
Can you still remember the last time you practiced at least one of your incident playbooks?
Once a playbook is set up, it shouldn't collect dust on a shelf but should be vetted and tested on a regular basis to make sure that it is still effective and constitutes the optimal response.
When you explore a scenario-based approach, you will naturally develop a selection of relevant or urgent scenarios which need to be trained. This is perfect for training based on the playbook.
There’s two key lessons learned to keep in mind:
Cadence:
Types:
From a personal example, I often create overarching exercise scenarios that cover multiple playbooks. This forces the team to think on their feet and quickly go over the different playbooks. This also allows you to create a narrative that can be used in multiple exercises, for example using different exercises for different phases of the scenario.
Value measurement
One area of research I believe we need to be more upfront about is measuring the value add of a given playbook. This brings us to a crucial discussion, as it certainly involves rating something done by someone. It’s not for all organizations. My key takeaway is that it’s better to think about this before someone asks you for it. This way, you can say with clear eyes what the exact value is.
Some of the industry’s most commonly tracked metrics to create empirical evidence are:
These metrics are designed to help tech teams understand how often incidents occur and how quickly the team bounces back from those incidents. Whatever metrics you will choose, I recommend measuring exercises vs real incident performance. Comparing the two. The difference provides you a lot of data points to work with.
Technology
Technology is mostly dependent on your own IT infrastructure.
Many prime vendors provide onboard solutions that include incident response playbooks. They tend to be locked into their own format and require some development work to customise it to your environment if possible.
From my experience, some playbooks exist in Confluence while others live in OneNote (or even on paper in a vault). This really depends on the organisation and the IT architectural stack you are using.
My key lessons learned are that you are best off creating them in the area also used for documentation in your security operations center or cyber fusion center. Taking into account operational security obviously.
Second, make sure to have a plan for an out-of-bound solution in case both the incident management system and subsequent playbooks are compromised. This is one of those things that you won’t take seriously until it’s really too late. Because if something happens then we will ‘spin up an out-of-bound Slack’ right? My friend Martyn Gill ll builds a tool for this exact use case for the SMB with their company ORNA . Full disclosure, Martyn helps out with Venation and I personally don’t get any benefits from marketing (maybe now I am!).
Some other good examples from different entities:
Role of LLM & AI:
Key takeaway here, automate as much as possible. Just make sure to have a granular understanding of what is AI-supported and what not. This means that your teams can track over time. In addition, this allows you to monitor specific areas that have a higher chance of hallucinations.
We recommend using LLM for:
We discourage using LLM for:
Wrapping up
Adopting a scenario-based approach within the incident response process offers numerous benefits. Most notably, it can also significantly reduce the stress on frontline defenders during an incident.
Four actions you should take today:
If you like this article, then you’ll love the curated threat scenario repository we have at Venation.
Together with my Venation team I customize and build playbooks for teams, and train their incident response teams through tabletop exercise or hands-on-keyboard simulations.
Check out more information via www.venation.digital.
Cheers!
Threat Intelligence Account Manager | Committed to Customer Success • Collaborating to Build Strong Customer Relationships • Enhancing Customers’ Systems and Security Posture • Pipeline Forecasting & Order Mgmt
10moImpressive approach to cutting response time in security incidents!
Founder and CEO at ORNA | TEDx Speaker | Published Author | Investor
10moNot to butt in unceremoneously, but we digitized this completely at ORNA 👌 Scenario-specific AI Playbooks, team management, etc. even in the Free version. This is what a CRM is to filing cabinets 👀 6 free DFIR playbooks included: https://www.orna.app