AI Model Comparison

ERNIE 4.5 VL 424B A47B is a multimodal vision-language MoE model developed by Baidu. It accepts text and image inputs with text output, activating 47B of its 424B total parameters per token across a 131K-token context window. Built on a heterogeneous MoE architecture jointly pre-trained on text and vision, it applies modality-isolated routing so that one modality does not hinder the learning of another. The model supports both thinking and non-thinking modes: in non-thinking mode it excels at visual perception, document and chart understanding, and visual knowledge, while in thinking mode it retains those perception strengths and adds stronger multimodal reasoning, narrowing or even surpassing the gap to OpenAI-o1 on reasoning-centric benchmarks such as MathVista, MMMU, and VisualPuzzle. Post-trained with SFT, DPO, UPO, and RLVR, it supports English and Chinese and is released under the Apache 2.0 license.

Author

Baidu

Release Date

2025-07-01

Knowledge Cutoff

—

License

Open Model

I/O Format

Context Length

131K

API I/O (1M)

—

How to Use

—

Output Speed

—

Arena Overall

—

Intelligence Index

—

Coding Index

—

Math Index

—

LiveBench

—

ForecastBench

—

GPQA Diamond

—

HLE

—

MMLU-Pro

—

AIME 2025

—

MATH-500

—

LB Reasoning

—

LB Math

—

LB Data Analysis

—

LiveCodeBench

—

LB Coding

—

LB Agentic

—

TAU2

—

TerminalBench

—

SciCode

—

IFBench

—

AA-LCR

—

Hallucination (HHEM)

—

Factual Consistency (HHEM)

—

LB Language

—

LB Instruction Following

—

Calculate Cost View Model Details

1 / 3

Swipe to compare