LLM Battle Royale Reveals Alignment Tax Effects
- •Grok 4.1 Fast won 13 of 30 games at a cost of $0.97 per win
- •Claude Sonnet 4.6 won 5 games but cost $26.78 per win due to cooperative behavior
- •Eleven LLMs participated in a 30-game battle royale to evaluate agentic performance and alignment tax
Eleven LLMs were dropped into 30 matches of a 2D battle royale simulation to test their competitive performance and agentic behavior, according to an analysis by Jacky Liang (Dev Rel Lead at OpenRouter). The models were tasked with playing the game directly—reasoning through moves, utilizing tools, and maintaining their own memory—without human intervention in their decision-making. The simulation tracked performance metrics including wins, kills, and cost per win, with scoring based on the Apex Legends competitive format.
Grok 4.1 Fast emerged as the most successful model, winning 13 of the 30 games at a cost of $0.97 per win. Despite having fewer kills than other contestants, it maintained high placement by employing a consistent car-ramming strategy and strict tactical rules, such as only firing when hit probability exceeded 90%. Conversely, Claude Sonnet 4.6, which won 5 matches at $26.78 per win, demonstrated a tendency toward cooperation, frequently attempting to form truces and sharing its location with opponents.
The experiment highlighted distinct performance differences among the models. GPT 5.4 led in raw combat effectiveness, securing 38 kills and coming in second on the leaderboard, yet it was the most expensive winner at $61.44 per win. Three models—GPT 5.4-mini, DeepSeek 4 Flash, and Kimi K2.6—failed to win a single game despite a combined expenditure of $57.15. The findings suggest that models often pay an alignment tax, where training for helpfulness and safety behaviors—like Anthropic's Constitution AI—can hinder performance in zero-sum competitive environments.
The results indicate that benchmark performance does not always correlate with task-specific success. While Grok excelled by doubling down on effective strategies without hesitation, other models were hindered by collaborative instincts trained into them during RLHF (training technique where human feedback shapes model behavior). Cost efficiency metrics showed that high-scoring models on industry leaderboards do not necessarily provide the best value for specific agentic tasks, as cheaper, less-aligned models demonstrated higher point generation per dollar in this battle royale context.