Llama 4 Scout is Meta's efficient multimodal language model with 16 experts activating 17 billion parameters out of 109B total. It supports native multimodal input (text and image) across 12 languages with a 10-million-token context window — one of the longest available — and uses early fusion for seamless modality integration. Designed for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding tasks, released under the Llama 4 Community License.
Llama 4 Scout is Meta's efficient multimodal language model with 16 experts activating 17 billion parameters out of 109B total. It supports native multimodal input (text and image) across 12 languages with a 10-million-token context window — one of the longest available — and uses early fusion for seamless modality integration. Designed for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding tasks, released under the Llama 4 Community License.