Agentic AI Security: When Your AI Stops Asking and Starts Doing
The chatbot answered your question. Annoying, maybe. Dangerous? Not really.
Now imagine this: you ask your AI assistant to book a flight. It reads your email, finds your passport number, accesses your corporate card, makes the reservation, and emails you the confirmation. All in thirty seconds. All without asking.
That’s an AI agent. And it represents the biggest shift in computer security since the cloud.
The Threat Model Flip
Traditional AI security focused on outputs. Could the AI generate harmful content? Could it reveal training data? Could it be jailbroken into saying something embarrassing? These were problems of persuasion—tricking an AI into convincing a human to do something bad.
Agentic AI flips that entirely. Now the question is what the AI can do directly, without any human involvement at all.
A chatbot with a vulnerability is a PR problem. An agent with the same vulnerability is a financial loss, a data breach, and a compliance violation—all at once.
What Makes Agents Different
Several architectural changes make agents fundamentally riskier than chatbots.
The first is tool access. Agents don’t just generate text—they make API calls, write files, execute code, and move money. When an agent can book flights, push code, or query databases, its permission scope becomes your attack surface. GitHub Copilot and Cursor agents can already write, edit, and push code to repositories, which is why several organisations have restricted their deployment after realising the blast radius of a compromised coding agent with production access.
The second is memory and state. Agents remember context across sessions, accumulating sensitive information over time. If an attacker poisons that memory, every future interaction is compromised. This is no longer theoretical—researchers have demonstrated how persistent agent contexts can be manipulated gradually, a technique sometimes called “slow compromise” where the agent is steered toward harmful outcomes one conversation at a time.
The third is autonomy gradients. “Human in the loop” sounds reassuring until you realise most implementations are a rubber-stamp exercise. Users develop approval fatigue. They trust the agent’s framing. They click approve without reading. The autonomy was sold as a feature; the oversight became theatre.
The fourth is action chaining. Agents combine multiple tool calls into a single workflow—read email, extract credentials, call payment API, route funds. Each individual step might look normal. The composite action is a heist.
The News Is Already Writing Itself
This isn’t hypothetical. The incidents are starting to stack up.
In late 2025, security researchers at Palo Alto Networks’ Unit 42 developed an entire framework for characterising attacks against agentic AI systems, acknowledging that the attack surface had grown too large to ignore. Around the same time, Salesforce’s AI agents made headlines when researchers demonstrated they could be forced to leak sensitive data through carefully crafted queries—a reminder that even enterprise-grade agent implementations carry significant risk.
The threat intelligence community has taken note. Earlier in 2025, experts warned that AI agents can leak company data through simple web searches, exploiting the fact that agents often have privileged access to corporate systems while browsing the internet. The Register reported that AI browsers remain wide open to attack via prompt injection, a vulnerability that becomes far more serious when the injected AI can take actions rather than just generating text.
Perhaps most alarming was Anthropic’s disclosure that they had disrupted what they called “the first documented case of a large-scale AI cyberattack executed without substantial human intervention”—a Chinese cyber espionage campaign where AI automated roughly 90 percent of the attack workflow. The attackers used Claude AI to automate vulnerability research, payload generation, and attack execution at scale. This isn’t science fiction. This is 2026.
How Attacks Actually Work
Understanding the attack vectors is essential for building proper defences.
Prompt injection through external data is the most immediate threat. The agent reads a webpage, email, or document that contains hidden instructions. With a chatbot, those instructions might produce a rude response. With an agent, they trigger a booking, a transfer, a code push. Recent research into indirect prompt injection has shown that simply visiting a malicious webpage can compromise an agent’s entire context and cause it to exfiltrate data or perform unwanted actions.
Memory poisoning exploits the agent’s persistent context. An agent with memory can be slowly manipulated over time—gradual goal drift through poisoned feedback, accumulating sensitive credentials that become a high-value target for attackers. Once an attacker controls what the agent remembers, they control what the agent does.
Tool compromise is the supply chain angle your team needs to worry about. Your agent uses a third-party API to book travel. That API gets breached—or worse, a popular agent plugin or framework is compromised, and every organisation using it inherits the vulnerability. Security teams are increasingly concerned about agent dependencies, especially as the ecosystem of agent tools and plugins explodes.
Privilege escalation through toolchains is elegant in its simplicity. An agent with access to your calendar, email, and payment system is one compromised session away from a full-scale account takeover. The agent doesn’t need to be hacked in a traditional sense—it just needs to be fed misleading context that causes it to chain together legitimate actions into an illegitimate outcome.
What To Do About It
This isn’t a call to ban agents—they’re too valuable for that. But the security controls need to evolve, and they need to evolve now.
Start with least privilege for agent identities. Agents should have separate service accounts, not human credentials. If the agent only needs to read calendar events, it shouldn’t have write access to payments. This sounds obvious, but in practice, organisations are deploying agents with far more access than they would ever grant a human intern.
Implement action-specific approval gates. Not every action needs human approval—rate limiting small transactions is reasonable. But wire transfers, code deployments, and PII access should always require explicit human authorisation. The key word is “explicit.” If your approval process can be bypassed with a single click, it will be.
Audit the audit trails. Standard logs don’t capture agent decision chains. You need to record what context the agent saw, what tools it called, and what it concluded—not just the final action. When something goes wrong, you’ll need to reconstruct the agent’s reasoning, not just its outputs.
Sandbox and isolate. Run agents in restricted environments. Limit network access. Assume compromise and contain the blast radius. The agent will inevitably encounter malicious content—the question is whether that content can reach your core systems.
Monitor for behavioural anomalies. Traditional security tools flag known-bad patterns. Agents generate novel action sequences that might all look legitimate in isolation. You need context-aware monitoring that understands what the agent is trying to accomplish and can flag unusual deviations.
Treat memory as a high-value asset. Encrypt agent memory at rest. Restrict access. Audit what the agent is accumulating. If an attacker can’t steal the data directly, they’ll try to steal the memory where it’s stored.
The Bigger Picture
Agentic AI isn’t coming—it’s here. Every organisation is evaluating autonomous agents for productivity gains, and security teams are being asked to approve deployments they barely understand.
The uncomfortable truth is that the security community is playing catch-up. We spent years building guardrails for chatbots. Now we need an entirely new framework for systems that act without asking.
The agents are coming. Make sure your threat model arrived first.
