What are the key points?

Researchers analyzed token consumption patterns in LLM-based multi-agent systems during software development tasks. The study found that the code review stage consumes 59.4% of all tokens in ChatDev workflows. Input tokens represent an average of 53.9% of total consumption, highlighting significant operational inefficiencies.

Study Quantifies Token Usage in Multi-Agent Software Engineering

•Researchers analyzed token consumption patterns in LLM-based multi-agent systems during software development tasks.
•The study found that the code review stage consumes 59.4% of all tokens in ChatDev workflows.
•Input tokens represent an average of 53.9% of total consumption, highlighting significant operational inefficiencies.

A research paper submitted on January 20, 2026, examines token consumption in multi-agent systems designed for software engineering. By analyzing 30 development tasks performed by the ChatDev framework—which utilizes the GPT-5 reasoning model—researchers quantified how tokens are distributed across stages including design, coding, code completion, review, testing, and documentation. The study aims to provide a standardized evaluation framework to help practitioners predict operational costs and improve workflow efficiency within the Software Development Life Cycle (SDLC).

The analysis reveals that the iterative code review stage is responsible for the largest portion of token usage, accounting for an average of 59.4% of total consumption. Furthermore, the findings indicate that input tokens consistently make up the majority of resource usage, representing an average of 53.9% of tokens consumed. These results suggest that the primary financial and computational costs of agentic software engineering stem from automated refinement and verification processes rather than initial code generation. The researchers recommend developing more token-efficient collaboration protocols to address these systemic inefficiencies.

A research paper submitted on January 20, 2026, examines token consumption in multi-agent systems designed for software engineering. By analyzing 30 development tasks performed by the ChatDev framework—which utilizes the GPT-5 reasoning model—researchers quantified how tokens are distributed across stages including design, coding, code completion, review, testing, and documentation. The study aims to provide a standardized evaluation framework to help practitioners predict operational costs and improve workflow efficiency within the Software Development Life Cycle (SDLC).

The analysis reveals that the iterative code review stage is responsible for the largest portion of token usage, accounting for an average of 59.4% of total consumption. Furthermore, the findings indicate that input tokens consistently make up the majority of resource usage, representing an average of 53.9% of tokens consumed. These results suggest that the primary financial and computational costs of agentic software engineering stem from automated refinement and verification processes rather than initial code generation. The researchers recommend developing more token-efficient collaboration protocols to address these systemic inefficiencies.