Hallucination rate measured by Vectara HHEM. Measures how often an LLM introduces hallucinations when summarizing a document. Lower is better.
Source: Vectara HHEM| Rank | Model | |
|---|---|---|
| #1 | Grok Grok 4.1 Fast (Reasoning) | 19.2% |
| #2 | Grok Grok 4.1 Fast | 17.8% |
| #3 | OpenAI GPT-5 | 14.7% |
| #4 | OpenAI GPT OSS 120B | 14.2% |
| #5 | Moonshot AI Kimi K2.5 | 14.2% |
| #6 | Gemini 3 Flash | 13.5% |
| #7 | OpenAI GPT-5 Mini | 12.9% |
| #8 | Anthropic Claude Opus 4.6 | 12.2% |
| #9 | Anthropic Claude Opus 4 | 12.0% |
| #10 | Anthropic Claude Opus 4.7 | 12.0% |
| #11 | Anthropic Claude Sonnet 4.5 | 12.0% |
| #12 | Anthropic Claude Opus 4.1 | 11.8% |
| #13 | Anthropic Claude Opus 4.5 | 10.9% |
| #14 | Anthropic Claude Sonnet 4.6 | 10.6% |
| #15 | OpenAI GPT-5 Nano | 10.5% |
| #16 | Gemini 3.1 Pro | 10.4% |
| #17 | Anthropic Claude Sonnet 4 | 10.3% |
| #18 | Z.ai GLM-5 | 10.1% |
| #19 | Z.ai GLM-5.1 | 10.1% |
| #20 | Anthropic Claude Haiku 4.5 | 9.8% |
| #21 | OpenAI GPT-4o Mini TTS | 9.6% |
| #22 | OpenAI GPT-5.4 Pro | 8.3% |
| #23 | Gemini 3.1 Flash Lite | 8.2% |
| #24 | Meta Llama 4 Maverick | 8.2% |
| #25 | Gemini 2.5 Flash | 7.8% |
| #26 | Meta Llama 4 Scout | 7.7% |
| #27 | Gemma 4 31B | 7.4% |
| #28 | Gemini 2.5 Pro | 7.0% |
| #29 | OpenAI GPT-5.4 | 7.0% |
| #30 | Arcee AI Trinity Large Thinking | 6.9% |
| #31 | DeepSeek DeepSeek V3.2 | 6.3% |
| #32 | OpenAI GPT-4.1 | 5.6% |
| #33 | OpenAI GPT-5.4 Mini | 5.5% |
| #34 | Gemini 2.5 Flash Lite | 3.3% |
| #35 | OpenAI GPT-5.4 Nano | 3.1% |