The Security of AI: The Art of Incident Response

AI and LLMs are transforming how organisations build products and run operations. They’re also changing what “an incident” looks like. When a traditional system is compromised, you often get familiar signals: malware, lateral movement, suspicious binaries, noisy C2 traffic. With AI systems, the first sign can be quieter, more ambiguous, and far more dangerous to ignore.

Not a service outage. Not an obvious intrusion. Just a model that starts behaving… differently.

I’ve spoken to a lot of security professionals who feel comfortable with classic incident response but get uneasy when the discussion shifts to production AI. That unease is rational. Attackers don’t need to take your platform down to harm you. They can steer it. They can degrade it. They can manipulate it in ways that look like “business variance” rather than a cyber event—until the impact is real.

This post is about responding to incidents in AI systems: what to look for, how to reason about the signals, and how to organise a response when the system is technically healthy but operationally compromised.

What an AI incident looks like

Incidents in AI platforms often present as subtle deviations from expected behaviour. That’s the defining feature: the system still answers requests, still returns outputs, still meets uptime targets—and yet it starts producing outcomes that are measurably off.

A recommendation engine gradually shifts toward riskier options. A fraud model starts missing obvious fraud. A content system becomes easier to manipulate. A customer service agent starts “confidently wrong” in patterns that correlate with a specific input source. In regulated environments, this isn’t just a security issue; it’s a governance and liability issue too.

The hard part is that these symptoms don’t align neatly with traditional detection signatures. They look like drift, or bias, or a bad deploy, or “weird user behaviour”. That ambiguity is where attacker dwell time lives.

Detection is behavioural, not signature-based

For AI systems, detection starts with the discipline of baselines. If you don’t know what “normal” looks like for inputs, outputs, and model performance, you don’t know what “suspicious” looks like either.

Some of the most useful high-level signals are still familiar, just reframed:

  • Volume anomalies. Spikes in request rates can indicate probing, extraction attempts, or attempts to overwhelm and distract.
  • Input distribution shifts. If the shape of incoming data changes abruptly—or changes in a way that maps to a particular client, geography, or integration—it can indicate manipulation or targeted probing.
  • Output and decision anomalies. A model that suddenly produces higher-risk recommendations, higher false positives, or unusual confidence patterns needs investigation even if infrastructure metrics look fine.
  • Behavioural “near-misses”. Repeated similar requests with small variations can be a sign of systematic probing—especially in systems exposed via API.

The key is automation. Human intuition doesn’t scale when the anomaly is “a slow drift over 48 hours” across thousands of decisions. The detection layer has to be able to surface statistically meaningful deviations quickly and route them into investigation.

The hidden complexity: interdependencies

AI-powered systems are rarely “a model behind an API”. They’re systems-of-systems: pipelines, feature stores, retrieval layers, prompt templates, tool integrations, policy checks, caching layers, and feedback loops. If one component is compromised, the blast radius may show up somewhere else entirely.

This is why incident responders need a dependency map. Not a perfect one, but enough to answer basic questions under pressure:

  • What upstream data sources can influence outputs?
  • What parts of the pipeline can change without a deployment?
  • What integrations can take actions on the model’s behalf?
  • Where do prompts, policies, retrieval sources, and tools intersect?

A single weak point in that graph can cascade into “the model is wrong everywhere”.

Response: treat the model like production… because it is

When an incident is suspected, the goal is to move quickly without turning a hypothesis into self-inflicted downtime. A pragmatic AI incident response flow looks like this:

Containment first. If the system can take actions (approve loans, trigger workflows, send messages, modify records), containment is about preventing further harm, not proving root cause. That might mean quarantining an endpoint, disabling tool access, switching to a safe mode, or failing over to a simpler model or static rules.

Preserve evidence early. AI incidents are easy to “fix” in ways that destroy forensic value—retraining, redeploying, refreshing data, rolling back configs. Before any of that, capture what you can: model version identifiers, prompt templates, retrieval sources, feature set versions, pipeline runs, configuration states, and relevant logs.

Scope the blast radius. Is the anomaly isolated to one tenant, one integration, one geography, one data source? Does it correlate with a particular feature, prompt, or upstream feed? AI incidents often resolve faster when you find the boundary where “normal” becomes “weird”.

Root cause, but with humility. Sometimes it’s an attacker. Sometimes it’s drift. Sometimes it’s a broken upstream feed. Sometimes it’s a deployment mistake. The response process should be able to handle all of those without biasing toward the most exciting answer.

Mitigation and hardening. Fix the immediate issue, then build the controls that would have caught it earlier next time: tighter access control to training and pipeline artefacts, stricter provenance controls for data sources, rate limits for probing behaviour, better output monitoring, and safer integration patterns for tool use.

This is where a strong incident management platform helps—not because AI is special, but because coordination under uncertainty is always the hardest part.

A concrete scenario (fintech, fictional)

Imagine a digital bank using an ML system to assist credit decisions. The system begins recommending higher-risk loans to customers who historically score as low risk. There’s no outage. No obvious compromise. Just a change in decision quality that points directly at business impact.

A mature response would begin by limiting harm: isolate the decisioning pathway, move to a safer fallback, and restrict any automated approvals. Then the team would collect evidence across the pipeline: which model version is running, what data sources fed the decision, whether feature distributions changed, and whether the change correlates with a particular channel or integration.

The key is not the story; it’s the pattern. AI incidents often surface through business metrics first. If the security team waits for classic “security telemetry” to light up, the incident can run for days.

The mindset shift

AI incident response is still incident response: contain, investigate, eradicate, recover, learn. The difference is what counts as a “compromise” and what your telemetry needs to look like to detect it.

The art is learning to treat behavioural deviation as a security signal, not just a data science problem. Because in production, a model that can be quietly steered is not just inaccurate—it’s exploitable.

Until the next post.