What are the key points?

MiniCPM-V 4.6 1.3B Instruct scores 13 on the Artificial Analysis Intelligence Index. Model achieves 38% on MMMU-Pro, the highest for sub-2B parameter open weights models. Dense architecture requires just 5.4M output tokens for index benchmarking, highlighting high efficiency.

OpenBMB Releases MiniCPM-V 4.6 1.3B Multimodal Model

•MiniCPM-V 4.6 1.3B Instruct scores 13 on the Artificial Analysis Intelligence Index.
•Model achieves 38% on MMMU-Pro, the highest for sub-2B parameter open weights models.
•Dense architecture requires just 5.4M output tokens for index benchmarking, highlighting high efficiency.

OpenBMB released MiniCPM-V 4.6 1.3B Instruct, a vision-language model supporting text, image, and video inputs, on May 11, 2026. Developed by a collaboration between Tsinghua University’s NLP Lab and ModelBest Inc., the 1.3B parameter dense model is licensed under Apache 2.0 and available on Hugging Face.

The model achieved a score of 13 on the Artificial Analysis Intelligence Index, outperforming Qwen3.5 0.8B (10) while trailing Qwen3.5 2B (15). It set a new benchmark for its size class, recording 38% on the MMMU-Pro visual reasoning task—the highest for any open weights model under 2B parameters. It also demonstrated high token efficiency, utilizing only 5.4M output tokens to run the Intelligence Index, approximately 19x fewer than Qwen3.5 0.8B (101M) and 43x fewer than Qwen3.5 0.8B (233M).

Despite performance gains, the model exhibits limitations in knowledge recall, scoring -85 on the AA-Omniscience benchmark. This result is consistent with other sub-2B non-reasoning models like Exaone 4.0 1.2B (-83) and Qwen3.5 0.8B (-89). The model operates with a 262K context window and BF16 precision.

OpenBMB released MiniCPM-V 4.6 1.3B Instruct, a vision-language model supporting text, image, and video inputs, on May 11, 2026. Developed by a collaboration between Tsinghua University’s NLP Lab and ModelBest Inc., the 1.3B parameter dense model is licensed under Apache 2.0 and available on Hugging Face.

The model achieved a score of 13 on the Artificial Analysis Intelligence Index, outperforming Qwen3.5 0.8B (10) while trailing Qwen3.5 2B (15). It set a new benchmark for its size class, recording 38% on the MMMU-Pro visual reasoning task—the highest for any open weights model under 2B parameters. It also demonstrated high token efficiency, utilizing only 5.4M output tokens to run the Intelligence Index, approximately 19x fewer than Qwen3.5 0.8B (101M) and 43x fewer than Qwen3.5 0.8B (233M).

Despite performance gains, the model exhibits limitations in knowledge recall, scoring -85 on the AA-Omniscience benchmark. This result is consistent with other sub-2B non-reasoning models like Exaone 4.0 1.2B (-83) and Qwen3.5 0.8B (-89). The model operates with a 262K context window and BF16 precision.