OPEN CLAW - Agentic AI Security: Verifying

Agentic AI turns chat into action, expanding attack paths through tools and memory. Security now requires verifiable identity, isolation, and logs.

An AI assistant that hallucinates is embarrassing. An AI assistant that executes is operational risk.

That distinction is now driving the most important security debate in enterprise software. Agentic systems—AI that can plan, call tools, write files, trigger workflows, and persist memory—are crossing the boundary from conversation into governance. For CIOs, CISOs, and compliance teams, the question is no longer whether the model is “accurate.” It is whether the organisation can prove what the agent is, what it is allowed to do, and what it actually did.

Over the past year, vendors have marketed “AI agents” as productivity infrastructure: automated customer support triage, finance reconciliation, DevOps co-pilots, HR assistants, procurement bots. In parallel, security researchers have demonstrated that the same capabilities—tool use, long-term state, and delegated authority—create failure modes that traditional chatbot-era controls were never built to handle. The most worrying attacks do not require model takeover. They require only that an agent behaves “helpfully” in the wrong direction.

Three policy currents are converging. First, frameworks such as the NIST AI Risk Management Framework ask organisations to treat AI as a lifecycle risk-management problem, not a procurement decision. Second, the EU’s AI rulebook hardens expectations around logging, oversight, and cybersecurity for high-risk systems, including requirements for record-keeping and human oversight (see, for example, Article 12 and Article 14 on record-keeping and human oversight). Third, industry threat-taxonomies such as the OWASP Top 10 for LLM Applications are crystallising common failure patterns—prompt injection, insecure output handling, data poisoning, and supply chain vulnerabilities—into something security teams can operationalise.

From chat to action: agency versus autonomy

Enterprises often collapse two concepts into one: “the agent can do things.” Security architecture cannot afford that imprecision. Agency describes capability—the system can select actions, call tools, and alter its environment. Autonomy describes governance—the system can do so with reduced human supervision, including initiating sequences of actions to reach a goal.

A system may have agency without meaningful autonomy: for example, an assistant that drafts a command but requires an operator to click “approve” before execution. Conversely, a system can be highly autonomous with limited agency: a narrow workflow bot that triggers pre-approved steps on a schedule. In practice, the highest-risk deployments combine both: broad tool access plus freedom to decide when and how to use it.

That combination reorders the security model. Chatbots are mostly reactive and frequently stateless. Agents are goal-driven, stateful, and integrated. The attack surface expands accordingly: the model is no longer the only control plane. The toolchain becomes the control plane.

The new attack primitives: memory, tools, and delegated authority

The defining feature of agentic systems is not intelligence. It is reach. Tool integrations—email, ticketing systems, CRMs, cloud consoles, CI/CD pipelines, databases—turn a language model into a workflow operator. Each connector creates a new entry point, and each permission scope becomes a blast-radius decision.

Attackers have adapted quickly. Prompt injection, once a nuisance that produced odd answers, now functions as an execution-control technique. The OWASP taxonomy frames this shift clearly: a manipulated input can cause an agent to take unintended actions, or to mishandle its outputs in ways that become downstream vulnerabilities.

Persistent memory raises the stakes further. A proof-of-concept published by Palo Alto Networks’ Unit 42 demonstrated indirect prompt injection that poisons long-term agent memory, using an agent that browses external content and stores “helpful” reminders for future sessions. The security problem is structural: the boundary between untrusted content and trusted state collapses. A poisoned memory does not look like an exploit at runtime; it looks like “context.”

Then there is the human safeguard that organisations rely on most: approvals. Security teams often assume that a human-in-the-loop gate neutralises agent risk. Researchers have shown that this assumption is fragile. OWASP documents a technique called HITL Dialog Forging (Lies-in-the-Loop), in which the approval interface itself can be manipulated—through padding, truncation, formatting tricks, or context pollution—so a reviewer authorises a dangerous action that appears benign. Checkmarx’s write-up of Lies-in-the-Loop attacks is a reminder that “a human clicked approve” is not the same as “a human understood what they approved.”

The final primitive is delegation. Agents often act “on behalf of” a user, a department, or a service account. That creates classic confused-deputy dynamics: a low-trust input can induce a high-trust actor to execute a harmful instruction using legitimate credentials. In a multi-agent environment, this risk compounds. If agents treat peer outputs as trusted inputs, compromise can cascade through the workflow graph at machine speed.

A secure verification layer: what “verifiable agency” looks like

Most enterprise deployments still authenticate agents the way they authenticate scripts: API keys, OAuth tokens, and shared service accounts. That model fails in an agentic world because it answers the wrong question. It tells you that some credential was presented. It does not prove which agent instance is acting, under which delegated authority, within which task boundaries.

A credible verification layer has three parts.

First, cryptographic identity for non-human actors. Each agent needs a unique machine identity and an attestation mechanism that binds the running workload to that identity. In practice, that means short-lived credentials, workload identity federation where possible, and signed provenance metadata that records the agent’s owner, version, allowed tools, and operational scope. The objective is auditability: when an agent touches a system, the log should say which agent did it, not which shared token happened to be available.

Second, policy enforcement at the action boundary. Organisations should treat tool calls as privileged operations, not as model output. Build an “action gateway” that inspects every proposed tool invocation before execution: validate parameters against allow-listed schemas, require additional approvals for high-impact classes (payments, deletions, permission changes), and deny network or filesystem access by default. This is the same security logic that made modern API gateways and service meshes valuable—applied to agent actions.

Third, isolation and receipts. If an agent can run code or manipulate files, it should do so in sandboxes with sharply constrained egress. Record tamper-evident “receipts” for every action: input source, model output, policy decision, tool call, tool response, and the final result. Article 12-style logging is not bureaucracy; it is the only way to reconstruct why the system did what it did when something goes wrong.

One practical design pattern is to split the agent into two planes: a reasoning plane that proposes actions, and an execution plane that enforces policy, identity verification, and isolation. The model remains useful, but it no longer directly touches production systems. That separation is what turns “agentic AI” from a clever demo into governable infrastructure.

Governance that scales: compliance without theatre

Agent governance fails when it becomes performative: a checklist that says “human oversight” while approvals are rubber-stamped, or “logging enabled” while logs are unusable in an incident review. Regulated sectors will be forced into more disciplined practice because the legal standard is moving. The EU’s AI framework explicitly ties high-risk systems to traceability and oversight obligations, while NIST’s AI RMF treats governance as an organisational function that persists across the AI lifecycle.

There is also a commercial reason to get this right. Enterprises increasingly run agents in customer-facing workflows, operational back offices, and developer tooling. When an agent misroutes invoices, exfiltrates sensitive data, or executes destructive commands, the organisation cannot hide behind the claim that “the model made a mistake.” In legal and operational terms, the agent behaves like a digital insider. The liability follows the credential.

A counterpoint deserves attention: aggressive controls can degrade usefulness. If every tool call requires approvals, productivity collapses. If memory is severely restricted, the agent loses continuity. If sandboxing blocks common workflows, users will route around the system, creating shadow deployments that are worse than the sanctioned one. The goal is not maximal restriction. It is measurable risk reduction tied to the impact of actions.

That is why tiered autonomy matters. Low-impact agents can operate with broad latitude in read-only environments. High-impact agents should face stronger verification, narrower permissions, and mandatory stoppoints. The organisation should be able to dial autonomy up or down as confidence grows—or as threat conditions change.

What to watch next

Security will shift from model-centric controls to runtime control. The most important innovations will not be smarter prompts. They will be better action gateways, better identity attestation for non-human entities, and better forensic-grade logging that can survive scrutiny from regulators, auditors, and incident responders.

Expect a near-term arms race around memory. Persistent state is simultaneously the feature that makes agents valuable and the mechanism that allows subtle compromise to persist. Enterprises will increasingly demand memory partitioning, write filters that detect instruction-like content, and rollback capabilities that allow “state recovery” after poisoning events—treating memory more like a database with integrity constraints than like a chat transcript.

For a deeper look at identity risk and non-human credentials, see our internal briefing on non-human identities and delegated authority. For policy context, our explainer on what the EU AI framework demands operationally outlines the compliance implications for agent deployments.

The enterprise problem is no longer whether AI can reason. It is whether the enterprise can verify what AI is allowed to do before it does it.

OPEN CLAW - Agentic AI Security: Verifying Autonomous Action

From chat to action: agency versus autonomy

The new attack primitives: memory, tools, and delegated authority

A secure verification layer: what “verifiable agency” looks like

Governance that scales: compliance without theatre

What to watch next