Experts Warn Against Using LLMs as Final Security Judges
- •Security expert Brian Hall argues LLMs should not serve as final decision-makers for AI agent actions.
- •Using models as security judges is risky because they share the same vulnerabilities as the agent they monitor.
- •Hall advocates for deterministic, rule-based authorization to ensure consistent, auditable security for production environments.
Brian Hall, a member of the AARM security group, argues that developers should not use language models (LLMs) to make final authorization decisions for AI agents. The current industry trend involves using an 'LLM-judge' to review tool calls before execution, essentially placing a second model to monitor the first. Hall contends this is fundamentally flawed because models remain susceptible to prompt injection (crafting input to manipulate model behavior) and social engineering. If an attacker can deceive an agent, they have a high probability of deceiving the judge as well, as both rely on similar probabilistic inference mechanisms that are inherently open to influence.
Beyond security vulnerabilities, Hall highlights the non-deterministic nature of LLMs as a significant liability for mission-critical operations. Models utilize sampling (a technique where the same input can produce varied outputs), meaning a security decision could fluctuate for identical actions. This inconsistency makes auditing, debugging, and system reliability impossible compared to traditional, rule-based access controls that consistently enforce strict 'deny' or 'allow' policies. Such deterministic rules provide a reliable audit trail and predictable behavior that language models cannot guarantee.
Hall emphasizes that this is not an argument to exclude models from security architectures entirely. LLMs excel at detecting anomalies, flagging sensitive text, or identifying suspicious patterns across sequences of calls. Instead, he proposes a layered security strategy where models are restricted to advisory roles, such as flagging potentially risky actions for manual review or secondary system analysis. The final, authoritative decision to execute sensitive actions—such as dropping a production database or moving financial assets—must reside in a deterministic, rule-based system. This separation ensures that the final gatekeeper is 'boring' by design, providing stable and transparent enforcement that an agent cannot talk its way past. Hall has developed an open-source project called Faramesh to implement this philosophy, ensuring that permit-deny decisions occur outside the model's inference path.