IBM Releases CUGA Agent Harness for Enterprise Apps
- •IBM released CUGA, an open-source agent harness that automates orchestration for enterprise-level agentic applications.
- •The framework manages complex agent state and reflection, achieving #1 rankings on AppWorld and WebArena benchmarks.
- •CUGA includes built-in governance, such as intent guards and tool approvals, to enable production-ready agent deployment.
IBM Research released CUGA (Configurable Generalist Agent), an open-source agent harness designed to streamline the development of enterprise-grade agentic applications. CUGA acts as a management layer that handles core orchestration tasks including planning, state maintenance, tool execution, and self-correction, allowing developers to focus on defining specific tool lists and system prompts. By utilizing a reflection step for error handling, the framework aims to improve reliability on multi-step tasks. Its architecture has demonstrated performance in benchmarks such as AppWorld (ranking #1 between 07/25 and 02/26) and WebArena (ranking #1 between 02/25 and 09/25).
To demonstrate the harness, IBM published two dozen single-file applications in the cuga-apps repository, ranging from a cloud architecture advisor to lead-generation systems. These apps serve as templates for implementing common patterns, such as RAG (retrieval-augmented generation) and multi-agent delegation. The framework includes native support for interchangeable tools, including OpenAPI, MCP (Model Context Protocol), and LangChain functions. Developers can configure reasoning modes—Fast, Balanced, or Accurate—and specify code execution environments like Docker or E2B without altering the agent’s logic.
For production environments, CUGA integrates a policy-driven governance system directly into the runtime. This includes six policy types, such as Intent Guards, Tool Approvals, and Output Formatters, which are versioned alongside the code in a local configuration folder. Policies utilize a sqlite-vec database to match user intent semantically, allowing for control over agent behavior without manual overrides. As applications grow in complexity, the framework supports a supervisor pattern where a central agent delegates subtasks to specialized agents, each maintaining isolated context. An additional feature, ALTK-Evolve, facilitates on-the-job learning by allowing agents to refine their operational skills based on past execution data, reducing the need for repeated prompting during recurring tasks.