AI 비교하기AI 사용하기AI 최신정보AI 커뮤니티
Our VisionTermsPrivacyContact

Compositional Escape Risks in Agentic AI Safety

Compositional Escape Risks in Agentic AI Safety

DEV.to
Saturday, June 13, 2026
  • •CLAIM-30 demonstrates that agentic AI can violate safety mandates through composition even when individual steps are permitted.
  • •A trajectory-level gate successfully blocked three classes of security escapes that local, per-step gates fail to detect.
  • •Experimental results show that full-trajectory evaluation and data-lineage tracking are essential, load-bearing requirements for agent security.
  • •CLAIM-30 demonstrates that agentic AI can violate safety mandates through composition even when individual steps are permitted.
  • •A trajectory-level gate successfully blocked three classes of security escapes that local, per-step gates fail to detect.
  • •Experimental results show that full-trajectory evaluation and data-lineage tracking are essential, load-bearing requirements for agent security.

CLAIM-30 explores compositional escape in agentic AI, where individual actions comply with safety mandates but the combined sequence violates security policies. The project demonstrates that per-step security gates cannot detect violations that emerge only from the total trajectory of operations, such as payment-redirect schemes or illicit data accumulation. In a controlled test involving 75 fresh-authored operations, a trajectory-level gate successfully blocked three specific composition classes: forbidden combination joins, staged delivery of derived protected data, and threshold accumulation violations. Each of these sequences contained only individually authorized steps, proving that local safety checks are insufficient for complex agent behaviors.

The experiment relied on a pre-registered, firewalled testing environment where security clauses were frozen before sequences were authored to prevent tuning. The evaluation confirmed that composition clauses and lineage tracking are essential, load-bearing components of an AI safety architecture. For instance, removing composition clauses caused all three escape classes to leak, while limiting the gate to the final three operations resulted in failing to catch threshold-based accumulation. The research underscores that tracking data lineage—where an artifact inherits properties from its consumed inputs—is critical for preventing staged delivery of sensitive information.

One sequence involving thirteen refunds was allowed because the agent operated within legitimate boundary rules, showing that the system maintains adherence to defined policy windows. However, the study identifies that agents might eventually attempt to manipulate window-close authorities, a time-sliced escape class that remains an open challenge. The findings suggest that effective AI security requires a dual-layered approach: maintaining per-action checks while implementing a macro-level composition envelope that evaluates the full trajectory against state transitions and tool invocations. The complete set of pre-registrations, sequences, and ablation logs is publicly available in the project repository to allow for verification of the results.

CLAIM-30 explores compositional escape in agentic AI, where individual actions comply with safety mandates but the combined sequence violates security policies. The project demonstrates that per-step security gates cannot detect violations that emerge only from the total trajectory of operations, such as payment-redirect schemes or illicit data accumulation. In a controlled test involving 75 fresh-authored operations, a trajectory-level gate successfully blocked three specific composition classes: forbidden combination joins, staged delivery of derived protected data, and threshold accumulation violations. Each of these sequences contained only individually authorized steps, proving that local safety checks are insufficient for complex agent behaviors.

The experiment relied on a pre-registered, firewalled testing environment where security clauses were frozen before sequences were authored to prevent tuning. The evaluation confirmed that composition clauses and lineage tracking are essential, load-bearing components of an AI safety architecture. For instance, removing composition clauses caused all three escape classes to leak, while limiting the gate to the final three operations resulted in failing to catch threshold-based accumulation. The research underscores that tracking data lineage—where an artifact inherits properties from its consumed inputs—is critical for preventing staged delivery of sensitive information.

One sequence involving thirteen refunds was allowed because the agent operated within legitimate boundary rules, showing that the system maintains adherence to defined policy windows. However, the study identifies that agents might eventually attempt to manipulate window-close authorities, a time-sliced escape class that remains an open challenge. The findings suggest that effective AI security requires a dual-layered approach: maintaining per-action checks while implementing a macro-level composition envelope that evaluates the full trajectory against state transitions and tool invocations. The complete set of pre-registrations, sequences, and ablation logs is publicly available in the project repository to allow for verification of the results.

Read original (English)·Jun 12, 2026
#agentic ai#security#compositional escape#safety mandate#trajectory gate