The Security of AI: The Art of Incident Response

Fri May 17, 2024

AI and LLMs are transforming how organisations build products and run operations. They’re also changing what “an incident” looks like. When a traditional system is compromised, you often get familiar signals: malware, lateral movement, suspicious binaries, noisy command-and-control (C2) traffic. With AI systems, the first sign can be quieter, more ambiguous, and far more dangerous to ignore.

Not a service outage. Not an obvious intrusion. Just a model that starts behaving differently.

Many security professionals feel comfortable with classic incident response but get uneasy when the discussion shifts to production AI. That unease is rational. Attackers don’t need to take your platform down to harm you. They can steer it. They can degrade it. They can manipulate it in ways that look like “business variance” rather than a cyber event — until the impact is real. The attack taxonomy is broader than most teams expect: data poisoning, model evasion, model extraction, and prompt injection are all distinct threat categories that require different detection and response approaches.

This post is about responding to incidents in AI systems: what to look for, how to reason about the signals, and how to organise a response when the system is technically healthy but operationally compromised.

What an AI incident looks like

Incidents in AI platforms often present as subtle deviations from expected behaviour. That’s the defining feature: the system still answers requests, still returns outputs, still meets uptime targets — and yet it starts producing outcomes that are measurably off.

A recommendation engine gradually shifts toward riskier options. A fraud model starts missing obvious fraud. A content system becomes easier to manipulate. A customer service agent starts “confidently wrong” in patterns that correlate with a specific input source. In regulated environments, this isn’t just a security issue; it’s a governance, liability, and potentially a regulatory notification issue too.

The hard part is that these symptoms don’t align neatly with traditional detection signatures. They look like drift, or bias, or a bad deploy, or “weird user behaviour”. That ambiguity is where attacker dwell time lives. Drift can be entirely natural — statistical distributions shift over time — or it can be adversarially induced. The response process needs to distinguish between the two, because the remediation is very different.

Detection is behavioural, not signature-based

For AI systems, detection starts with the discipline of baselines. If you don’t know what “normal” looks like for inputs, outputs, and model performance, you don’t know what “suspicious” looks like either.

Some of the most useful high-level signals are still familiar, just reframed:

Volume anomalies. Spikes in request rates can indicate probing, extraction attempts, or attempts to overwhelm or distract.
Input distribution shifts. If the shape of incoming data changes abruptly — or changes in a way that maps to a particular client, geography, or integration — it can indicate manipulation or targeted probing.
Output and decision anomalies. A model that suddenly produces higher-risk recommendations, higher false positives, or unusual confidence patterns needs investigation even if infrastructure metrics look fine.
Behavioural “near-misses”. Repeated similar requests with small variations can be a sign of systematic probing — especially in systems exposed via API. For LLM-based systems, this includes prompt injection attempts: inputs crafted to override system instructions or extract information the model shouldn’t disclose.

The key is automation. Human intuition doesn’t scale when the anomaly is “a slow drift over 48 hours” across thousands of decisions. The detection layer has to surface statistically meaningful deviations within hours, not days, and route them into investigation. A sophisticated attacker who understands your baselining approach can craft inputs that stay just inside normal thresholds — low-and-slow manipulation is the AI equivalent of living off the land.

The hidden complexity: interdependencies

AI-powered systems are rarely “a model behind an API”. They’re systems-of-systems: pipelines, feature stores, retrieval layers, prompt templates, tool integrations, policy checks, caching layers, and feedback loops. If one component is compromised, the blast radius may show up somewhere else entirely.

This is why incident responders need a dependency map. Not a perfect one, but enough to answer basic questions under pressure. In practice, this means working with your ML engineering team to document the data lineage and component relationships — even a simple diagram maintained alongside your runbooks is better than nothing:

What upstream data sources can influence outputs?
What parts of the pipeline can change without a deployment?
What integrations can take actions on the model’s behalf?
Where do prompts, policies, retrieval sources, and tools intersect?

A single weak point in that graph can cascade into “the model is wrong everywhere”.

Response: treat the model like production… because it is

When an incident is suspected, the goal is to move quickly without turning a hypothesis into self-inflicted downtime. A pragmatic AI incident response flow looks like this:

Containment first. If the system can take actions (approve loans, trigger workflows, send messages, modify records), containment is about preventing further harm, not proving root cause. That might mean quarantining an endpoint, disabling tool access, switching to a safe mode, or failing over to a simpler model or static rules.

Preserve evidence early. AI incidents are easy to “fix” in ways that destroy forensic value — retraining, redeploying, refreshing data, rolling back configs. Before any of that, capture what you can: model version identifiers, prompt templates, retrieval sources, feature set versions, pipeline runs, configuration states, and relevant logs. Immutable logging and model registry snapshots are your best friends here — if you don’t have them before an incident, you won’t have them during one.

Scope the blast radius. Is the anomaly isolated to one tenant, one integration, one geography, one data source? Does it correlate with a particular feature, prompt, or upstream feed? AI incidents often resolve faster when you find the boundary where “normal” becomes “weird”.

Root cause, but with humility. Sometimes it’s an attacker. Sometimes it’s drift. Sometimes it’s a broken upstream feed. Sometimes it’s a deployment mistake. The response process should handle all of those without biasing toward the most exciting answer.

Mitigation and hardening. Fix the immediate issue, then build the controls that would have caught it earlier next time: tighter access control to training and pipeline artefacts, stricter provenance controls for data sources, rate limits for probing behaviour, better output monitoring, and safer integration patterns for tool use.

Ownership matters. AI incident response often falls between the security team and the ML engineering team, with neither feeling fully accountable. Establish clear ownership before an incident happens. In most mature organisations, this means a joint response model where security leads the investigation and ML engineering leads the technical remediation — but someone needs to be the single accountable owner for coordination.

This is where a strong incident management platform and well-rehearsed playbooks help — not because AI is special, but because coordination under uncertainty is always the hardest part.

A concrete scenario (fintech, fictional)

Say a digital bank is using an ML system to assist credit decisions. The system begins recommending higher-risk loans to customers who historically score as low risk. There’s no outage. No obvious compromise. Just a change in decision quality that points directly at business impact.

A mature response would begin by limiting harm: isolate the decisioning pathway, move to a safer fallback, and restrict any automated approvals. Then the team would collect evidence across the pipeline: which model version is running, what data sources fed the decision, whether feature distributions changed, and whether the change correlates with a particular channel or integration.

The key is not the story; it’s the pattern. AI incidents often surface through business metrics first. If the security team waits for classic “security telemetry” to light up, the incident can run for days.

The mindset shift

AI incident response is still incident response: contain, investigate, eradicate, recover, learn. The difference is what counts as a “compromise” and what your telemetry needs to look like to detect it.

The art is learning to treat behavioural deviation as a security signal, not just a data science problem. In production, a model that can be quietly steered is not just inaccurate — it’s exploitable.

Further reading: MITRE ATLAS provides a structured knowledge base of adversarial tactics against AI systems, and the OWASP Machine Learning Security Top 10 is a practical starting point for understanding the most common AI-specific risks.