You Have a SOC. You Have Runbooks. You're Still Not Ready for This
The Incident Nobody Trained For
You’ve got a SOC. You’ve got runbooks. You’ve done tabletop exercises for ransomware, for data breaches, for insider threats. You’ve got your incident response team on speed dial.
But what happens when your AI customer support chatbot tells a customer they’re entitled to a refund they never made, cites a non-existent policy, and accidentally leaks another customer’s PII in the same conversation?
Is that in your playbook?
Why AI Incidents Are Different
Traditional incidents follow patterns. Malware spreads through known vectors. Attackers leave fingerprints. You contain, eradicate, recover, and you’re done.
AI incidents are messier.
The output isn’t deterministic Your model might behave perfectly in testing and go sideways in production. The same prompt that worked yesterday might fail today—not because of an attack, but because the model’s behaviour drifted.
Attribution is hard Was it a prompt injection? A poisoned training data issue? Or just the model having a bad day? Understanding the root cause matters, and it’s not always obvious.
The blast radius is different One vulnerable API endpoint might expose one customer’s data. One misbehaving AI might expose thousands of conversations, generate harmful content for everyone, or make business decisions that cost real money.
The “fix” isn’t straightforward You can’t just patch a model. Retraining takes time. Deploying a new version isn’t instant. And sometimes the issue is fundamental—you can’t just flip a switch and make it right.
A Framework for AI Incident Response
So how do you prepare? Here’s what I’ve been using, and what I’d recommend to security architects building this capability from scratch.
Phase 1: Detection and Triage
What to monitor: - Model outputs that contain PII, secrets, or policy violations - Unexpected changes in model behaviour (drift detection) - Unusual spike in certain types of queries - User complaints about AI “lying” or giving weird answers
The hard truth: You’re probably not going to catch this through traditional SIEM rules. You need output validation—automated checks on what the model is actually saying. Tools like Guardrails AI, LlamaGuard, or AWS Bedrock Guardrails can help classify and filter outputs in real-time.
Phase 2: Containment
This is where it gets interesting. With traditional software, you take the system offline. With AI, it’s rarely that simple:
- Can you take the AI offline without breaking everything? If it’s integrated into customer-facing systems, turning it off might cause more problems than the incident itself. Is this just one agent, or the main agent managing this?
- Can you switch to a fallback model? If you’ve got a less capable but safer model in reserve (like switching from GPT-4 to GPT-3.5), this might buy you time.
- Can you filter inputs? If you’re under a prompt injection attack, filtering malicious patterns might contain it without full shutdown.
Phase 3: Investigation
Ask these questions:
- What triggered it? Was it a specific prompt, a payload, or just model behaviour?
- What’s the scope? How many users saw the harmful output? How much data was leaked?
- Can you reproduce it? If you can, you’ve got something to test against.
- Is it adversarial or accidental? This matters for whether you’re dealing with an attack or a bug.
Phase 4: Communication
This is where most organisations fail. You’ve got to communicate to:
- Customers — Be honest. “Our AI made a mistake” is better than silence. Example: “Our AI system provided incorrect information to approximately X customers between Y and Z. We’re investigating and have temporarily disabled the feature.”
- Regulators — Depending on what happened, you might have GDPR breach notification obligations (72-hour deadline under Article 33).
- Internal stakeholders — Make sure the business understands what happened and the fix plan.
What this costs: IBM’s 2024 Data Breach Report found the average data breach costs £3.2M globally. AI incidents can exceed this—especially when hallucinations lead to financial losses, regulatory fines, or customer churn.
Phase 5: Recovery and Learning
- Fix the root cause — Retrain, fine-tune, add guardrails, update input filtering.
- Update your detection — If you missed it, add a rule.
- Tabletop it — Run through the scenario again. Did your response work? What would you do differently?
The Opinionated Bit
Here’s what I see happening in the industry: everyone’s excited about deploying AI, but nobody wants to talk about what happens when it goes wrong.
I’ve sat in security architecture reviews where I’ve asked “what’s your incident response plan for this AI?” and gotten blank stares. At a fintech last year, I asked their head of security about their chatbot incident plan. He laughed and said, “We’ll just turn it off.” That chatbot had access to 2 million customer records.
That’s not acceptable. Not when the AI has access to customer data. Not when it’s making decisions that affect people’s money, their privacy, their trust in your organisation.
If you’re deploying AI in production, you need:
- Output validation — Automated checks on what the model is saying
- Fallback plans — What happens when the model fails
- Clear ownership — Who’s accountable when AI messes up
- Communication templates — Pre-drafted statements for different scenarios
If your incident response plan doesn’t have a section called “What to do when the AI lies to customers,” you have a gap. A big one.
The Bottom Line
AI incidents aren’t hypothetical. They’re happening. The question isn’t if you’ll have one, it’s how ready you are when it arrives.
The organisations that’ll do well with AI aren’t the ones with the most sophisticated models. They’re the ones who’ve thought through what happens when things go wrong and have a plan to handle it.
Build that plan now. Before you need it.
