What are the key points?

New Anthropic models Opus 4.8 and Sonnet 5 demonstrate regressions in tool-calling schema adherence. Models invent unauthorized fields in the nested edits[] array, causing third-party coding tool Pi to reject calls. Training optimizations for native Claude Code tools may negatively impact compatibility with external third-party coding harnesses.

Newer Anthropic Models Struggle With Third-Party Tool Schemas

•New Anthropic models Opus 4.8 and Sonnet 5 demonstrate regressions in tool-calling schema adherence.
•Models invent unauthorized fields in the nested edits[] array, causing third-party coding tool Pi to reject calls.
•Training optimizations for native Claude Code tools may negatively impact compatibility with external third-party coding harnesses.

On July 4, 2026, Simon Willison reported on a technical issue where newer Anthropic models, specifically Opus 4.8 and Sonnet 5, produce malformed tool calls when interacting with the coding agent tool Pi. While the underlying code edits themselves are often correct, the models frequently insert unauthorized, invented fields into the nested edits[] array. This behavior leads Pi to reject the tool calls and request retries. This issue marks a regression, as older models within the same family do not exhibit this problem when using the same tool schemas.

Armin Ronacher, who identified the issue, suggests this behavior stems from recent training processes, likely involving Reinforcement Learning, that prioritize the specific edit tools native to Claude Code. Because these newer models are highly optimized for Claude's proprietary 'search and replace' edit tool, they struggle to adhere to the schema requirements of third-party coding harnesses. This divergence in tool usage highlights a growing challenge for developers, who may now need to implement multiple redundant edit tools to maintain compatibility across different model versions. OpenAI’s Codex, by comparison, utilizes an 'apply_patch' mechanism, demonstrating how specific training for designated tools influences model performance in agentic workflows.

On July 4, 2026, Simon Willison reported on a technical issue where newer Anthropic models, specifically Opus 4.8 and Sonnet 5, produce malformed tool calls when interacting with the coding agent tool Pi. While the underlying code edits themselves are often correct, the models frequently insert unauthorized, invented fields into the nested edits[] array. This behavior leads Pi to reject the tool calls and request retries. This issue marks a regression, as older models within the same family do not exhibit this problem when using the same tool schemas.

Armin Ronacher, who identified the issue, suggests this behavior stems from recent training processes, likely involving Reinforcement Learning, that prioritize the specific edit tools native to Claude Code. Because these newer models are highly optimized for Claude's proprietary 'search and replace' edit tool, they struggle to adhere to the schema requirements of third-party coding harnesses. This divergence in tool usage highlights a growing challenge for developers, who may now need to implement multiple redundant edit tools to maintain compatibility across different model versions. OpenAI’s Codex, by comparison, utilizes an 'apply_patch' mechanism, demonstrating how specific training for designated tools influences model performance in agentic workflows.