When AI Agents Get Privileges

The first wave of generative AI security was obsessed with content. Hallucinations. Toxicity. Brand risk. What the model might say if you asked it the wrong thing, the wrong way.

The second wave is about something more dangerous: capability.

Once an LLM can call tools—open tickets, query internal systems, send emails, push code, trigger workflows—it stops being a chat interface and starts looking like a new kind of privileged workload. Not malicious by design. Not even unreliable in the way people assume. Just connected to real systems with real permissions. And that’s enough.

An agent doesn’t need to “hack” anything in the classic sense. It needs to be steered. The attacker’s job becomes persuasion-by-proxy: slip instructions into a document, a webpage, a support ticket, a code comment, a PDF. The agent reads it as “context”, the model treats it as “instructions”, and suddenly the confused deputy problem isn’t academic—it’s in production.

This is why the old comfort of “the model is hosted by a reputable vendor” doesn’t help much. The model is not the boundary. The boundary is whatever the model can reach.

So the security architecture has to move one layer down: away from arguing about prompts and toward controlling actions. The model should propose. Something else should decide. That “something else” is an enforcement layer that sits between the agent and the tools, applying policy to every tool call: what can be called, by whom, with which arguments, against which resources, under which conditions.

The rest is hygiene that becomes suddenly non-negotiable. Short-lived credentials instead of long-lived tokens that get copied into notebooks. Narrow scopes instead of “temporary” broad permissions that become permanent. Step-up approvals for destructive actions, because the point of automation is speed—and speed is exactly what you don’t want when the decision is wrong.

The part most teams underestimate is detection. Agent misuse looks like business activity because it is business activity, just misdirected. If you don’t baseline tool usage, you won’t see the early shape of an incident: an agent touching unusual datasets, calling unfamiliar tools, hammering an API with small variations, or quietly shifting what it does at the edges.

Agentic systems are coming whether organisations are ready or not. The ones that survive the transition will treat agents like what they are: principals in the system, capable of action, requiring governance. The others will keep arguing about “safe prompts” while the real risk sits in the credentials.