Baidu

ERNIE 4.5 VL 424B A47B

Name: Baidu ERNIE 4.5 VL 424B A47B
Author: Baidu

비교

2025-07-01

비교

ERNIE 4.5 VL 424B A47B is a multimodal vision-language MoE model developed by Baidu. It accepts text and image inputs with text output, activating 47B of its 424B total parameters per token across a 131K-token context window. Built on a heterogeneous MoE architecture jointly pre-trained on text and vision, it applies modality-isolated routing so that one modality does not hinder the learning of another. The model supports both thinking and non-thinking modes: in non-thinking mode it excels at visual perception, document and chart understanding, and visual knowledge, while in thinking mode it retains those perception strengths and adds stronger multimodal reasoning, narrowing or even surpassing the gap to OpenAI-o1 on reasoning-centric benchmarks such as MathVista, MMMU, and VisualPuzzle. Post-trained with SFT, DPO, UPO, and RLVR, it supports English and Chinese and is released under the Apache 2.0 license.

비전|공개 모델Apache 2.0

학습 완료일

비공개

이 AI가 학습을 마친 날짜입니다. 이후 발생한 사건이나 정보는 알지 못할 수 있습니다.

입력 형식 → 출력 형식

이 AI에게 전달할 수 있는 정보 유형과, AI가 생성할 수 있는 결과물 유형을 나타냅니다.

처리용량

131K

한 번의 요청에서 AI가 한꺼번에 읽고 처리할 수 있는 최대 분량입니다. 숫자가 클수록 긴 문서나 대화를 처리할 수 있습니다.

개별 비용(백만 글자)

—

이 AI를 직접 연동해 사용할 때 발생하는 비용입니다. 텍스트 100만 단위(토큰)당 달러 기준으로 표시됩니다.