ERNIE 5.0 is Baidu's next-generation flagship foundation model with 2.4 trillion parameters, employing a unified autoregressive architecture for native multimodal processing across text, images, audio, and video. Unlike most competitors that fuse separate models for each modality, ERNIE 5.0 trains all modalities jointly within a single framework, enabling both understanding and generation without fragmented pipelines. Its ultra-sparse Mixture-of-Experts structure activates less than 3% of parameters per token, delivering high efficiency while maintaining frontier-level capabilities in reasoning, creative writing, and agent planning.
ERNIE 5.0 is Baidu's next-generation flagship foundation model with 2.4 trillion parameters, employing a unified autoregressive architecture for native multimodal processing across text, images, audio, and video. Unlike most competitors that fuse separate models for each modality, ERNIE 5.0 trains all modalities jointly within a single framework, enabling both understanding and generation without fragmented pipelines. Its ultra-sparse Mixture-of-Experts structure activates less than 3% of parameters per token, delivering high efficiency while maintaining frontier-level capabilities in reasoning, creative writing, and agent planning.