| 順位 | モデル | |
|---|---|---|
| #1 | OpenAI GPT-5.4 Nano | 96.9% |
| #2 | Gemini 2.5 Flash Lite | 96.7% |
| #3 | OpenAI GPT-5.4 Mini | 94.5% |
| #4 | OpenAI GPT-4.1 | 94.4% |
| #5 | DeepSeek DeepSeek V3.2 | 93.7% |
| #6 | Arcee AI Trinity Large Thinking | 93.1% |
| #7 | Gemini 2.5 Pro | 93.0% |
| #8 | OpenAI GPT-5.4 | 93.0% |
| #9 | Gemma 4 31B | 92.6% |
| #10 | Meta Llama 4 Scout | 92.3% |
| #11 | Gemini 2.5 Flash | 92.2% |
| #12 | Gemini 3.1 Flash Lite | 91.8% |
| #13 | Meta Llama 4 Maverick | 91.8% |
| #14 | OpenAI GPT-5.4 Pro | 91.7% |
| #15 | OpenAI GPT-4o Mini TTS | 90.4% |
| #16 | Anthropic Claude Haiku 4.5 | 90.2% |
| #17 | Z.ai GLM-5 | 89.9% |
| #18 | Z.ai GLM-5.1 | 89.9% |
| #19 | Anthropic Claude Sonnet 4 | 89.7% |
| #20 | Gemini 3.1 Pro | 89.6% |
| #21 | OpenAI GPT-5 Nano | 89.5% |
| #22 | Anthropic Claude Sonnet 4.6 | 89.4% |
| #23 | Anthropic Claude Opus 4.5 | 89.1% |
| #24 | Anthropic Claude Opus 4.1 | 88.2% |
| #25 | Anthropic Claude Opus 4 | 88.0% |
| #26 | Anthropic Claude Opus 4.7 | 88.0% |
| #27 | Anthropic Claude Sonnet 4.5 | 88.0% |
| #28 | Anthropic Claude Opus 4.6 | 87.8% |
| #29 | OpenAI GPT-5 Mini | 87.1% |
| #30 | Gemini 3 Flash | 86.5% |
| #31 | OpenAI GPT OSS 120B | 85.8% |
| #32 | Moonshot AI Kimi K2.5 | 85.8% |
| #33 | OpenAI GPT-5 | 85.3% |
| #34 | Grok Grok 4.1 Fast | 82.2% |
| #35 | Grok Grok 4.1 Fast (Reasoning) | 80.8% |