What are the key points?

TMAS framework scales test-time compute by coordinating multi-agent collaboration and structured information flow. Hierarchical memories use experience and guideline banks to reuse conclusions and avoid redundant reasoning. A hybrid reward reinforcement learning scheme improves scaling stability and iterative reasoning performance.

TMAS Framework Scales Test-Time Compute via Multi-Agent Synergy

•TMAS framework scales test-time compute by coordinating multi-agent collaboration and structured information flow.
•Hierarchical memories use experience and guideline banks to reuse conclusions and avoid redundant reasoning.
•A hybrid reward reinforcement learning scheme improves scaling stability and iterative reasoning performance.

Researchers introduced TMAS on May 11, a multi-agent framework designed to scale test-time compute for large language models. The system shifts from independent reasoning rollouts to a collaborative process where specialized agents share information across trajectories and refinement iterations. By organizing inference as a structured synergy, TMAS aims to overcome current limitations in balancing exploration and exploitation within reasoning tasks.

To manage information flow, the framework employs hierarchical memories consisting of an experience bank and a guideline bank. The experience bank stores reliable intermediate conclusions and local feedback for future reuse, while the guideline bank tracks high-level strategies to prevent redundant reasoning patterns. This architectural approach allows the system to explicitly decide which information remains useful for subsequent computational steps.

The researchers also developed a hybrid reward reinforcement learning scheme to align the multi-agent process. This training method preserves basic reasoning capabilities while improving the utilization of stored experiences and encouraging the discovery of novel solution strategies. Experimental results indicate that TMAS achieves stronger iterative scaling compared to existing test-time scaling baselines, providing increased stability during inference. The project's code and data were released via GitHub on May 12.

Researchers introduced TMAS on May 11, a multi-agent framework designed to scale test-time compute for large language models. The system shifts from independent reasoning rollouts to a collaborative process where specialized agents share information across trajectories and refinement iterations. By organizing inference as a structured synergy, TMAS aims to overcome current limitations in balancing exploration and exploitation within reasoning tasks.

To manage information flow, the framework employs hierarchical memories consisting of an experience bank and a guideline bank. The experience bank stores reliable intermediate conclusions and local feedback for future reuse, while the guideline bank tracks high-level strategies to prevent redundant reasoning patterns. This architectural approach allows the system to explicitly decide which information remains useful for subsequent computational steps.

The researchers also developed a hybrid reward reinforcement learning scheme to align the multi-agent process. This training method preserves basic reasoning capabilities while improving the utilization of stored experiences and encouraging the discovery of novel solution strategies. Experimental results indicate that TMAS achieves stronger iterative scaling compared to existing test-time scaling baselines, providing increased stability during inference. The project's code and data were released via GitHub on May 12.