What are the key points?

Salesforce presents 21 papers at ICLR 2026 focusing on enterprise-grade autonomous AI agent reliability. New multi-agent frameworks achieve over 60% success on OS tasks by combining GUI and programmatic control. Research identifies 'echoing'—a phenomenon where AI agents abandon roles to mirror their conversational partners.

Salesforce Research Unveils Breakthroughs in Autonomous AI Agents

•Salesforce presents 21 papers at ICLR 2026 focusing on enterprise-grade autonomous AI agent reliability.
•New multi-agent frameworks achieve over 60% success on OS tasks by combining GUI and programmatic control.
•Research identifies 'echoing'—a phenomenon where AI agents abandon roles to mirror their conversational partners.

At the Fourteenth International Conference on Learning Representations (ICLR 2026) in Rio de Janeiro, Salesforce AI Research has underscored a pivotal shift in the artificial intelligence landscape: the move from simple chatbots to autonomous, reliable agents. This year, their team is presenting 21 accepted papers, marking a clear pivot toward 'Agentic AI'—systems designed not just to process information, but to execute multi-step workflows in complex, real-world digital environments.

One of the most compelling findings from this research collection is the discovery of 'echoing.' In a study of over 2,500 interactions, researchers found that when autonomous agents communicate with one another, they often abandon their assigned operational roles, opting instead to mirror the linguistic style and stance of their conversational partner. Even more concerning is that 93% of these 'echoed' conversations were still flagged as successful by standard metrics, suggesting a dangerous blind spot in how we currently evaluate agent behavior. This highlights an urgent need for more nuanced safety frameworks in multi-agent environments.

Beyond safety, the research highlights massive strides in practical utility, particularly through GUI (Graphical User Interface) agents. Computers were originally designed for human hands and eyes, not for algorithmic manipulation. The team introduced systems like GTA1 and CoAct-1, which allow AI agents to navigate desktop environments by proposing multiple actions and utilizing programmatic subtasks. By combining these GUI interactions with coding capabilities, CoAct-1 achieved a 60.76% success rate on the complex OSWorld benchmark, significantly reducing the average number of steps required to complete tasks compared to previous architectures.

The research also tackles the 'reasoning' bottleneck. Methods like 'Elastic Reasoning' and 'HyRea' propose separating the thinking process from the actual solution, allowing models to budget their 'thought time' more efficiently. This approach not only maintains accuracy but also slashes token usage by approximately 40%, making these agents not just smarter, but drastically more efficient for enterprise scale. These papers suggest that the next frontier of AI isn't just about raw power—it is about reliability, efficiency, and the ability to operate autonomously within the constraints of modern software platforms.

At the Fourteenth International Conference on Learning Representations (ICLR 2026) in Rio de Janeiro, Salesforce AI Research has underscored a pivotal shift in the artificial intelligence landscape: the move from simple chatbots to autonomous, reliable agents. This year, their team is presenting 21 accepted papers, marking a clear pivot toward 'Agentic AI'—systems designed not just to process information, but to execute multi-step workflows in complex, real-world digital environments.

One of the most compelling findings from this research collection is the discovery of 'echoing.' In a study of over 2,500 interactions, researchers found that when autonomous agents communicate with one another, they often abandon their assigned operational roles, opting instead to mirror the linguistic style and stance of their conversational partner. Even more concerning is that 93% of these 'echoed' conversations were still flagged as successful by standard metrics, suggesting a dangerous blind spot in how we currently evaluate agent behavior. This highlights an urgent need for more nuanced safety frameworks in multi-agent environments.

Beyond safety, the research highlights massive strides in practical utility, particularly through GUI (Graphical User Interface) agents. Computers were originally designed for human hands and eyes, not for algorithmic manipulation. The team introduced systems like GTA1 and CoAct-1, which allow AI agents to navigate desktop environments by proposing multiple actions and utilizing programmatic subtasks. By combining these GUI interactions with coding capabilities, CoAct-1 achieved a 60.76% success rate on the complex OSWorld benchmark, significantly reducing the average number of steps required to complete tasks compared to previous architectures.

The research also tackles the 'reasoning' bottleneck. Methods like 'Elastic Reasoning' and 'HyRea' propose separating the thinking process from the actual solution, allowing models to budget their 'thought time' more efficiently. This approach not only maintains accuracy but also slashes token usage by approximately 40%, making these agents not just smarter, but drastically more efficient for enterprise scale. These papers suggest that the next frontier of AI isn't just about raw power—it is about reliability, efficiency, and the ability to operate autonomously within the constraints of modern software platforms.