What are the key points?

Apple launched Core AI to enable on-device generative AI on Apple Silicon hardware The framework supports models from 3B parameters up to 70B-parameter reasoning LLMs Core AI offers unified hardware access and optimizations like AOT compilation for faster performance

Apple Introduces Core AI for On-Device Generative AI

•Apple launched Core AI to enable on-device generative AI on Apple Silicon hardware
•The framework supports models from 3B parameters up to 70B-parameter reasoning LLMs
•Core AI offers unified hardware access and optimizations like AOT compilation for faster performance

At WWDC 26, Apple introduced Core AI, a new framework designed to enable the execution of large language models and generative AI entirely on-device. This tool serves as the successor to Core ML and is integrated into the foundation of Apple Intelligence. It supports a wide range of model architectures, from compact 3B-parameter vision models to large-scale LLMs with up to 70B parameters, across iPhone, iPad, Mac, and Apple Vision Pro devices. The framework provides a unified API for hardware access, allowing workloads to run seamlessly across CPU, GPU, and Neural Engine components.

Core AI features a memory-safe Swift API that supports zero-copy data paths and allows developers fine-grained control over inference memory. It also includes ahead-of-time (AOT) compilation, which shifts processing tasks off the device to deliver near-instant model load times. Developers can convert existing PyTorch models into the Core AI format using the Core AI PyTorch tool, or author new models using built-in composite operations. The framework supports critical deployment optimizations such as quantization and palettization, which reduce model disk size and runtime memory footprints while lowering inference latency and power consumption.

Models loaded into Core AI utilize automatic specialization to the specific hardware and OS version, which is stored in a model cache to improve subsequent execution speeds. Developers can manage these cached resources through specialized options or share them across application groups. Within the existing Apple ecosystem, the company recommends Core ML for classic non-neural tasks, Core AI for neural networks and transformers, and MLX for experimentation with custom model weights. This announcement marks a strategic shift toward enabling privacy-focused, zero-cloud-cost generative AI applications that operate without server dependencies.

At WWDC 26, Apple introduced Core AI, a new framework designed to enable the execution of large language models and generative AI entirely on-device. This tool serves as the successor to Core ML and is integrated into the foundation of Apple Intelligence. It supports a wide range of model architectures, from compact 3B-parameter vision models to large-scale LLMs with up to 70B parameters, across iPhone, iPad, Mac, and Apple Vision Pro devices. The framework provides a unified API for hardware access, allowing workloads to run seamlessly across CPU, GPU, and Neural Engine components.

Core AI features a memory-safe Swift API that supports zero-copy data paths and allows developers fine-grained control over inference memory. It also includes ahead-of-time (AOT) compilation, which shifts processing tasks off the device to deliver near-instant model load times. Developers can convert existing PyTorch models into the Core AI format using the Core AI PyTorch tool, or author new models using built-in composite operations. The framework supports critical deployment optimizations such as quantization and palettization, which reduce model disk size and runtime memory footprints while lowering inference latency and power consumption.

Models loaded into Core AI utilize automatic specialization to the specific hardware and OS version, which is stored in a model cache to improve subsequent execution speeds. Developers can manage these cached resources through specialized options or share them across application groups. Within the existing Apple ecosystem, the company recommends Core ML for classic non-neural tasks, Core AI for neural networks and transformers, and MLX for experimentation with custom model weights. This announcement marks a strategic shift toward enabling privacy-focused, zero-cloud-cost generative AI applications that operate without server dependencies.