What are the key points?

Researchers introduce a unified "Code as Agent Harness" framework for managing AI agent infrastructure. The framework categorizes agent systems into interface, mechanism, and scaling layers to improve reliability. The survey outlines open challenges like cross-agent state consistency and human oversight in safety-critical environments.

New Framework Positions Code as AI Agent Infrastructure

•Researchers introduce a unified "Code as Agent Harness" framework for managing AI agent infrastructure.
•The framework categorizes agent systems into interface, mechanism, and scaling layers to improve reliability.
•The survey outlines open challenges like cross-agent state consistency and human oversight in safety-critical environments.

A research survey titled "Code as Agent Harness," published on May 20, 2026, examines the evolving role of code in AI systems. While large language models (LLMs) were previously used primarily to generate software, the paper argues that code now functions as an operational substrate for agentic systems—AI models capable of autonomous goal-oriented tasks. This "agent harness" framework organizes infrastructure into three layers: interface, mechanisms, and scaling.

The interface layer connects agents to reasoning, action, and environment modeling. The mechanisms layer integrates planning, memory, and tool use to support long-horizon execution and feedback-driven control for reliable, adaptive operations. Finally, the scaling layer covers the transition from single-agent setups to multi-agent environments, where shared code artifacts facilitate coordination, review, and verification. The survey documents applications across fields including GUI and OS automation, scientific discovery, personalization, and enterprise workflows.

The authors outline open challenges for harness engineering, noting the need for evaluation metrics beyond final task success, verification under incomplete feedback, and ensuring regression-free improvements. They also identify requirements for maintaining consistent shared states across multiple agents and incorporating human oversight for safety-critical actions. The framework aims to provide a unified roadmap toward building executable, verifiable, and stateful AI systems.

A research survey titled "Code as Agent Harness," published on May 20, 2026, examines the evolving role of code in AI systems. While large language models (LLMs) were previously used primarily to generate software, the paper argues that code now functions as an operational substrate for agentic systems—AI models capable of autonomous goal-oriented tasks. This "agent harness" framework organizes infrastructure into three layers: interface, mechanisms, and scaling.

The interface layer connects agents to reasoning, action, and environment modeling. The mechanisms layer integrates planning, memory, and tool use to support long-horizon execution and feedback-driven control for reliable, adaptive operations. Finally, the scaling layer covers the transition from single-agent setups to multi-agent environments, where shared code artifacts facilitate coordination, review, and verification. The survey documents applications across fields including GUI and OS automation, scientific discovery, personalization, and enterprise workflows.

The authors outline open challenges for harness engineering, noting the need for evaluation metrics beyond final task success, verification under incomplete feedback, and ensuring regression-free improvements. They also identify requirements for maintaining consistent shared states across multiple agents and incorporating human oversight for safety-critical actions. The framework aims to provide a unified roadmap toward building executable, verifiable, and stateful AI systems.