What are the key points?

Classical HPO algorithms consistently outperform pure LLM agents in hyperparameter tuning tasks. Researchers developed Centaur, a hybrid optimizer combining classical interpretable states with LLM domain knowledge. Centaur achieved superior results using a 0.8B model, suggesting LLMs function best as complementary tools.

Hybrid AI Optimizers Outperform Pure LLM Agents

•Classical HPO algorithms consistently outperform pure LLM agents in hyperparameter tuning tasks.
•Researchers developed Centaur, a hybrid optimizer combining classical interpretable states with LLM domain knowledge.
•Centaur achieved superior results using a 0.8B model, suggesting LLMs function best as complementary tools.

A research study published on March 25, 2026, by Fabio Ferreira and colleagues explores whether LLM agents can surpass classical hyperparameter optimization (HPO) algorithms. The researchers utilized the autoresearch repository—a framework allowing AI agents to tune hyperparameters by modifying training code—to compare performance on a small language model under a fixed compute budget. Findings indicate that classical methods, such as CMA-ES (Covariance Matrix Adaptation Evolution Strategy) and TPE (Tree-structured Parzen Estimator), consistently outperform LLM-based agents. According to the study, LLMs struggle to track optimization states across trials, whereas classical algorithms are more adept at avoiding out-of-memory errors. While allowing LLMs to directly edit source code narrows this performance gap, it remains insufficient even when using frontier models like Claude Opus 4.6 and Gemini 3.1 Pro Preview.

To address these limitations, the authors introduced Centaur, a hybrid system that integrates the interpretable internal states of classical optimizers—such as the mean vector, step-size, and covariance matrix—with the domain knowledge of LLMs. Centaur outperformed both pure LLM approaches and traditional classical methods in the experiment, with even a 0.8B parameter model proving sufficient to achieve superior results. The researchers concluded that LLMs are currently most effective when used as a complement to classical optimizers rather than a complete replacement for them. Further analysis in the study covers search diversity, model scaling up to frontier models, and the impact of varying the fraction of LLM-proposed trials within the Centaur system.

A research study published on March 25, 2026, by Fabio Ferreira and colleagues explores whether LLM agents can surpass classical hyperparameter optimization (HPO) algorithms. The researchers utilized the autoresearch repository—a framework allowing AI agents to tune hyperparameters by modifying training code—to compare performance on a small language model under a fixed compute budget. Findings indicate that classical methods, such as CMA-ES (Covariance Matrix Adaptation Evolution Strategy) and TPE (Tree-structured Parzen Estimator), consistently outperform LLM-based agents. According to the study, LLMs struggle to track optimization states across trials, whereas classical algorithms are more adept at avoiding out-of-memory errors. While allowing LLMs to directly edit source code narrows this performance gap, it remains insufficient even when using frontier models like Claude Opus 4.6 and Gemini 3.1 Pro Preview.

To address these limitations, the authors introduced Centaur, a hybrid system that integrates the interpretable internal states of classical optimizers—such as the mean vector, step-size, and covariance matrix—with the domain knowledge of LLMs. Centaur outperformed both pure LLM approaches and traditional classical methods in the experiment, with even a 0.8B parameter model proving sufficient to achieve superior results. The researchers concluded that LLMs are currently most effective when used as a complement to classical optimizers rather than a complete replacement for them. Further analysis in the study covers search diversity, model scaling up to frontier models, and the impact of varying the fraction of LLM-proposed trials within the Centaur system.