The Incident Response Plan Nobody Tests


Every enterprise I’ve worked with has an incident response plan. A well-formatted PDF sitting in a SharePoint folder somewhere. It has phases: Identification, Containment, Eradication, Recovery, Lessons Learned. It has escalation procedures, communication templates, role assignments, and a beautiful flow chart that maps the decision tree.

It has never been tested.

The gap between “having a plan” and “being able to execute a plan under pressure at 3am” is the gap between safety and disaster. And most organizations don’t discover the size of that gap until they’re in the middle of a real incident — when the stakes are highest and the capacity for learning is lowest.

Why Untested Plans Fail

Incident response plans fail for predictable reasons that would be discovered instantly if the plan were tested even once. Here are the patterns I’ve observed across dozens of organizations:

Contact Lists Are Outdated

The plan says to call the VP of Engineering. That person left six months ago. Their phone number still points to their personal cell, which now belongs to someone in a different time zone who is deeply confused by a call about a data breach at 2am.

This sounds like a minor administrative issue. It isn’t. During a security incident, every minute spent tracking down the correct contact is a minute the attacker is operating undetected. A 30-minute delay in reaching the right person can mean the difference between containing a breach to one system and watching it propagate across your infrastructure.

Contact lists degrade continuously. People change roles, change phone numbers, change on-call schedules. A contact list that was accurate six months ago is unreliable. A contact list that was accurate twelve months ago is fiction.

Tooling Access Doesn’t Exist

The plan says to check the WAF logs. Three people have WAF console access. Two are on vacation. The third doesn’t remember their password. By the time someone gets access, the incident has been running for four hours — four hours of potential data exfiltration, lateral movement, or system compromise.

The plan says to isolate the compromised server. The engineer on call has never used the network segmentation tool. They open a ticket with IT. IT’s SLA is 4 hours. The incident has been a day-long event before the first containment step is executed.

Tooling access during incidents requires three things: multiple people with active credentials, regular verification that those credentials work, and documented procedures for emergency access when normal channels fail.

Decision Authority Is Unclear

The plan says to “escalate to the CISO.” But the CISO doesn’t have authority to shut down a revenue-generating production system. That requires the CRO’s approval. Who requires the CEO’s approval if the revenue impact exceeds $100K. It’s now 3am and you’re trying to reach executives who haven’t rehearsed this scenario and don’t have the context to make a rapid decision.

Meanwhile, the attacker doesn’t need approval from anyone.

Decision authority during incidents must be pre-authorized, not negotiated in real time. The incident commander needs the explicit authority to take specific actions — including shutting down systems — without escalation delays. This authority must be documented, agreed upon by leadership, and understood by everyone involved.

Communication Breaks Down

The plan has a template for customer notification. But Legal hasn’t reviewed it for compliance with your jurisdiction’s breach notification laws. PR hasn’t approved the language. The CEO wants to wordsmith it personally. And nobody has agreed on whether to send it now (transparent but potentially premature) or later (thorough but potentially distrusted).

Internal communication is equally problematic. The engineering team is working in a war room Slack channel. The executives are texting each other. Legal is sending emails. PR is drafting statements. Nobody has a unified view of what’s happening, what’s been done, and what remains.

Runbooks Are Missing or Stale

Even when the plan identifies the right actions, the detailed procedures for executing those actions are often missing. “Isolate the compromised workload” is a plan step. How to actually do it — which commands to run, which tool to use, what order to follow, how to verify isolation is complete — is a runbook. And that runbook either doesn’t exist, hasn’t been updated since the infrastructure changed, or is written for a tool the team no longer uses.

The Tabletop Exercise

A tabletop exercise is a simulation where key stakeholders walk through a hypothetical incident scenario, step by step, making decisions and coordinating responses — without touching real systems. It’s the fire drill equivalent for cybersecurity.

Format

Duration: 90-120 minutes. Shorter exercises skip critical phases. Longer exercises lose participant energy.

Facilitator: Someone outside the incident response chain. This can be an external consultant, a CISO from a partner company, or an internal security leader who won’t have a role in the exercise scenario. The facilitator drives the scenario, introduces complications, and ensures all participants engage.

Participants: 8-12 people across engineering, security, IT operations, legal, communications/PR, customer success, and executive leadership. Including executives is essential because many incident decisions require executive authority — and executives who haven’t practiced those decisions will hesitate under pressure.

Example Scenario

“At 2:15 PM on a Tuesday, your monitoring system detects unusual database query patterns. Investigation reveals that an API endpoint has been returning customer PII — names, emails, and encrypted passwords — in error responses to unauthenticated requests. The endpoint has been in production for three months. Your earliest access logs go back 90 days. You don’t know how many records may have been exposed or whether any malicious actor has harvested the data.”

Walk Through Each Phase

Detection: How did we find this? What monitoring triggered? Would we have found it without that specific monitoring? What if the monitoring was down?

Assessment: How do we determine the scope? Which logs do we need? Who has access? How quickly can we determine which records were exposed? What regulatory obligations are triggered?

Containment: Do we take the endpoint offline immediately? What services depend on it? Who authorizes the shutdown? What’s the customer impact? How do we communicate the downtime?

Investigation: How do we find out if anyone actually exploited the vulnerability? What forensic evidence do we preserve? Do we engage external incident response? When do we involve law enforcement?

Notification: Who notifies customers? When? What do we say? What are our legal obligations by jurisdiction? Who approves the notification text? How do we handle media inquiries?

Recovery: How do we fix the vulnerability, verify the fix, and restore the endpoint? How do we prevent similar vulnerabilities? What code review or security testing changes are needed?

Post-Incident: How do we conduct a blameless post-mortem? Who participates? What artifacts do we produce? How do we track remediation items?

Discover Every Gap

The exercise will reveal gaps. It always does. The contact list for the security team has three disconnected phone numbers. Legal hasn’t reviewed the breach notification template since the company expanded to the EU. The database team doesn’t have after-hours access to the production logging system. The CEO doesn’t have a clear policy on when to notify the board.

Every gap discovered in a tabletop exercise is a gap that won’t cause confusion during a real incident. That’s the value proposition: pay 90 minutes of planning to avoid hours of chaos during a crisis.

Building a Quarterly Program

I recommend quarterly tabletop exercises, each with a different scenario category:

  • Q1: External breach — an attacker exfiltrates customer data through a compromised application
  • Q2: Ransomware — production systems are encrypted, backups are potentially compromised, the attacker demands payment
  • Q3: Insider threat — an employee is exfiltrating intellectual property before departing for a competitor
  • Q4: Supply chain attack — a compromised dependency in your production code introduces a backdoor

Each scenario should be tailored to your specific infrastructure, technology stack, and threat model. A generic scenario produces generic insights. A specific scenario reveals your specific weaknesses.

After Each Exercise

Update the incident response plan. Fix every gap discovered. Update every stale contact. Verify every tool access.

Verify runbooks. Actually execute the technical procedures documented in your runbooks — not during a real incident, but in a test environment. If the runbook says “run this command to isolate the workload,” verify that the command works.

Assign remediation owners. Every gap needs an owner and a deadline. Gaps without owners don’t get fixed. Track remediation completion and report to leadership.

Document lessons learned. Share the exercise findings broadly — not to embarrass anyone, but to build organizational awareness. The security team’s gaps are the whole company’s risk.

Then do it again next quarter. Because the exercise itself is the plan. The document is just documentation.


The Garnet Grid perspective: We facilitate tabletop exercises for organizations that want to test their incident response capabilities before they need them. Our exercises are tailored to your specific infrastructure, threat model, and regulatory environment. Contact us →

JDR
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →
Garnet Grid Consulting

Need help implementing these strategies?

Our team of architects and engineers turn analysis into action. From cloud migration to AI readiness — we deliver results, not reports.

Explore Our Solutions → Enterprise consulting • Architecture audits • Implementation delivery