AI 비교하기AI 사용하기AI 최신정보AI 커뮤니티
Our VisionTermsPrivacyContact

Hybrid AI Optimizers Outperform Pure LLM Agents

Hybrid AI Optimizers Outperform Pure LLM Agents

arXiv
Wednesday, June 10, 2026
  • •Classical HPO algorithms consistently outperform pure LLM agents in hyperparameter tuning tasks.
  • •Researchers developed Centaur, a hybrid optimizer combining classical interpretable states with LLM domain knowledge.
  • •Centaur achieved superior results using a 0.8B model, suggesting LLMs function best as complementary tools.
  • •Classical HPO algorithms consistently outperform pure LLM agents in hyperparameter tuning tasks.
  • •Researchers developed Centaur, a hybrid optimizer combining classical interpretable states with LLM domain knowledge.
  • •Centaur achieved superior results using a 0.8B model, suggesting LLMs function best as complementary tools.

A research study published on March 25, 2026, by Fabio Ferreira and colleagues explores whether LLM agents can surpass classical hyperparameter optimization (HPO) algorithms. The researchers utilized the autoresearch repository—a framework allowing AI agents to tune hyperparameters by modifying training code—to compare performance on a small language model under a fixed compute budget. Findings indicate that classical methods, such as CMA-ES (Covariance Matrix Adaptation Evolution Strategy) and TPE (Tree-structured Parzen Estimator), consistently outperform LLM-based agents. According to the study, LLMs struggle to track optimization states across trials, whereas classical algorithms are more adept at avoiding out-of-memory errors. While allowing LLMs to directly edit source code narrows this performance gap, it remains insufficient even when using frontier models like Claude Opus 4.6 and Gemini 3.1 Pro Preview.

To address these limitations, the authors introduced Centaur, a hybrid system that integrates the interpretable internal states of classical optimizers—such as the mean vector, step-size, and covariance matrix—with the domain knowledge of LLMs. Centaur outperformed both pure LLM approaches and traditional classical methods in the experiment, with even a 0.8B parameter model proving sufficient to achieve superior results. The researchers concluded that LLMs are currently most effective when used as a complement to classical optimizers rather than a complete replacement for them. Further analysis in the study covers search diversity, model scaling up to frontier models, and the impact of varying the fraction of LLM-proposed trials within the Centaur system.

A research study published on March 25, 2026, by Fabio Ferreira and colleagues explores whether LLM agents can surpass classical hyperparameter optimization (HPO) algorithms. The researchers utilized the autoresearch repository—a framework allowing AI agents to tune hyperparameters by modifying training code—to compare performance on a small language model under a fixed compute budget. Findings indicate that classical methods, such as CMA-ES (Covariance Matrix Adaptation Evolution Strategy) and TPE (Tree-structured Parzen Estimator), consistently outperform LLM-based agents. According to the study, LLMs struggle to track optimization states across trials, whereas classical algorithms are more adept at avoiding out-of-memory errors. While allowing LLMs to directly edit source code narrows this performance gap, it remains insufficient even when using frontier models like Claude Opus 4.6 and Gemini 3.1 Pro Preview.

To address these limitations, the authors introduced Centaur, a hybrid system that integrates the interpretable internal states of classical optimizers—such as the mean vector, step-size, and covariance matrix—with the domain knowledge of LLMs. Centaur outperformed both pure LLM approaches and traditional classical methods in the experiment, with even a 0.8B parameter model proving sufficient to achieve superior results. The researchers concluded that LLMs are currently most effective when used as a complement to classical optimizers rather than a complete replacement for them. Further analysis in the study covers search diversity, model scaling up to frontier models, and the impact of varying the fraction of LLM-proposed trials within the Centaur system.

Read original (English)·Mar 1, 2026
#hyperparameter optimization#autoresearch#llm#cma es#tpe#centaur#optimization algorithms