Cloudflare Acquires Ensemble AI Team to Scale Infrastructure
- •Cloudflare acquired the team from Ensemble AI on June 15, 2026, to bolster its AI infrastructure capabilities.
- •The acquisition integrates proprietary technologies like NdLinear and NdLinear-LoRA to enhance model compression and inference efficiency.
- •Cloudflare aims to lower inference costs and GPU overhead to help developers scale AI workloads across its global network.
Cloudflare announced on June 15, 2026, the acquisition of the team from Ensemble AI, a San Francisco-based company founded in 2023. The incoming team will join Cloudflare's Workers AI division to accelerate development of AI infrastructure, specifically focusing on making large models faster, smaller, and more cost-effective to serve. Ensemble AI specialized in model compression and efficient inference techniques designed to reduce memory, compute, and deployment overhead for large language models and multimodal architectures.
The Ensemble AI team contributes expertise in architectural-level model building blocks, moving beyond standard hardware-based optimization or quantization. A primary innovation is NdLinear, a drop-in replacement for standard linear layers in transformer models. Unlike traditional methods that flatten structures, NdLinear operates directly on multidimensional activations, preserving meaningful axes such as heads, channels, and spatial dimensions while reducing parameter count. The team also developed NdLinear-LoRA, an efficient adaptation method that reduces the number of trainable parameters needed for fine-tuning large models.
Cloudflare intends to integrate these techniques into its existing AI stack, which includes the inference engine Infire and tensor compression technology known as Unweight. By combining these methods with Cloudflare’s global network, the company aims to address the economics of inference, which is a major barrier for developers scaling AI applications like agents, personalized models, and reinforcement learning. These efficiency gains are designed to allow developers to deploy AI workloads with lower memory requirements and reduced operational complexity across Cloudflare’s serverless platform.
This acquisition follows a series of recent infrastructure investments by Cloudflare aimed at expanding its developer-focused AI capabilities. The combined team will prioritize improving GPU utilization and scalable deployment patterns to ensure AI services remain accessible and affordable as customer workloads expand. Developers currently using Cloudflare Workers AI will gain access to these improved machine learning engineering capabilities to experiment with diverse model sizes and deployment patterns without being hindered by cost or structural limitations.