Mastering Threat Modelling for Next-Gen Workloads
The cybersecurity landscape has shifted in a way that’s easy to miss if you’re still looking for the usual signatures: misconfigurations, missing patches, careless access control. AI hasn’t replaced those problems. It has simply added a new layer where the failure modes aren’t always obvious, and where an attacker doesn’t need to “break in” if they can influence what the model learns or how it decides.
AI is no longer a skunkworks experiment running in a separate corner of the organisation. It’s being embedded directly into fraud detection, forecasting, customer support, quality assurance, routing and logistics, security analytics, and decision-making workflows that carry real commercial weight. That changes what “risk” looks like. It also changes what threat modelling needs to cover. The demand for threat modelling and risk assessments tailored to AI workloads is rising fast for a simple reason: traditional approaches, while foundational, often under-describe what can go wrong when the system’s behaviour is learned rather than explicitly coded.
Why traditional threat modelling falls short
Classic threat modelling frameworks are still useful, but AI systems don’t behave like conventional software. A typical application does what the code says. An AI system does what it learned, and what it learned is shaped by data, labels, feedback loops, training pipelines, and sometimes opaque third-party components. The most important trust boundaries aren’t always at the API layer; they’re upstream in the data supply chain, inside training environments, and in the mechanisms that connect model outputs to downstream actions.
That difference matters because it creates attack surfaces that look more like manipulation than intrusion. In an AI-enabled workflow, the objective might not be to get shell access. It might be to make the model confidently wrong, predictably biased, quietly leaky, or reliably exploitable when presented with a certain trigger.
Adversarial attacks: when inputs lie
Adversarial attacks are the best-known AI risk, partly because they’re easy to demonstrate and unnervingly effective. The pattern is simple: tweak the input in a way that a human would barely notice, and the model misclassifies it with high confidence. In the real world that could mean subtly altering images so a system interprets a stop sign incorrectly, or manipulating audio so a voice interface behaves in unexpected ways.
From a threat modelling perspective, the key question is not “can the model be fooled?” because many models can. The key question is where your critical input channels are, how an attacker can reach them, and what the impact is when the model makes the wrong call. If a misclassification leads to a human review, that’s one level of impact. If it triggers an automated decision at scale, that’s another.
Model poisoning: the slow compromise
If adversarial attacks are about lying to a model at inference time, model poisoning is about corrupting what the model becomes. This is an integrity attack on the learning process itself, usually introduced during training. An attacker injects malicious or misleading examples into the dataset so the model learns the wrong patterns, develops blind spots, or becomes biased in a way that benefits the attacker later.
It’s particularly insidious because it doesn’t always show up as an immediate performance collapse. The model can appear healthy while being strategically compromised. Threat modelling here needs to treat data provenance as a first-class concern: where the data comes from, how it’s collected, who can alter it, how labels are created, where it’s stored, and what controls exist to detect anomalous patterns before they become “truth” inside the model.
Prompt injection: the new code injection
Large language models created a new kind of attack surface that feels familiar to anyone who has lived through the era of SQL injection: prompt injection. Instead of exploiting a parser, the attacker exploits the model’s tendency to follow instruction-like input. A cleverly crafted prompt can push an LLM into bypassing guardrails, revealing sensitive information, or doing things it was never intended to do—especially when the model is connected to tools, data sources, or systems that can take action.
Threat modelling for prompt injection is about understanding capabilities and integrations. If the LLM has retrieval access, you need to think about what can be exposed through generated outputs. If it can call tools, you need to think about how an attacker can coerce it into taking unauthorised actions. The risk is not just “bad words in, bad words out”; it’s uncontrolled execution paths and data exfiltration through a system designed to be helpful.
Data leakage: models that remember too much
There’s also a quieter risk that tends to surface only when someone asks the privacy question properly: models can leak information about the data they were trained on. Membership inference attacks attempt to determine whether a specific person’s data was included in training. Model inversion attacks attempt to reconstruct aspects of training data from outputs. Neither attack needs to degrade the model’s performance. They exploit the fact that models can encode traces of their training data, especially when that data is sensitive and the training regime isn’t designed with privacy in mind.
Threat modelling here needs to connect the sensitivity of training data to the exposure of model outputs. It also needs to acknowledge that “we didn’t centralise the data” is not automatically the same as “we protected privacy”. Techniques like differential privacy can help, but they introduce trade-offs in utility that should be considered explicitly rather than bolted on late.
Evolving the practice: what changes in 2025
AI threat modelling works best when it’s treated as part of the lifecycle rather than a one-off workshop. Models change. Data drifts. Pipelines get refactored. Retrieval sources grow. Tool integrations expand. A threat model that was accurate at deployment can be obsolete after the next training run.
This is also one of the rare places in security where cross-functional work isn’t optional. Threat modelling an AI system without data scientists and ML engineers is like threat modelling Kubernetes without speaking to the platform team. Security needs to understand the ML pipeline, and ML teams need to understand how adversaries think. The overlap is where the real risks—and the real mitigations—live.
It’s worth saying plainly: the hard part isn’t listing threats. The hard part is translating them into credible business impact and then designing controls that fit how models are built, trained, deployed, and monitored. AI risk, done well, still comes down to the same fundamentals: trust boundaries, integrity, availability, confidentiality, and the reality that systems behave exactly as designed, not as intended.
For security architects, this is both a challenge and an opportunity. AI is becoming a core business dependency. Threat modelling needs to evolve with it, because the cost of being surprised by learned behaviour is far higher than the cost of planning for it.
