What are the key points?

Researchers developed Program-as-Weights to compile natural-language specifications into compact, locally-executable neural artifacts. A 0.6B Qwen3 interpreter using PAW adapters matches 32B model performance with 1/50th the memory usage. The system enables local execution at 30 tokens/s on a MacBook M3, removing reliance on external APIs.

Program-as-Weights Compiles Natural Language into Neural Programs

•Researchers developed Program-as-Weights to compile natural-language specifications into compact, locally-executable neural artifacts.
•A 0.6B Qwen3 interpreter using PAW adapters matches 32B model performance with 1/50th the memory usage.
•The system enables local execution at 30 tokens/s on a MacBook M3, removing reliance on external APIs.

Researchers introduced Program-as-Weights (PAW), a programming paradigm that compiles natural-language specifications into compact neural artifacts for local execution. This approach addresses the limitations of relying on large language model APIs for tasks like log analysis or data ranking, which often face issues with locality, reproducibility, and high costs. PAW utilizes a 4B compiler trained on FuzzyBench, a new 10M-example dataset, to emit parameter-efficient adapters for a 0.6B Qwen3 interpreter. This configuration matches the performance of prompting a 32B model directly while utilizing roughly one-fiftieth of the inference memory. On a MacBook M3, the system achieves an inference speed of 30 tokens/s.

By reframing the model from a per-input problem solver into a tool builder, PAW allows developers to invoke the compiler once to create a reusable artifact. Subsequent applications of the defined function are locally executed and cost-effective. The system supports various use cases, including an 'Alien Taboo' game, a 3D avatar director, and a website content assistant. Users can integrate these functions into Python workflows using a straightforward compilation command, bypassing the need for external API calls during routine function application. The research team also provided integration paths for coding agents, allowing automated systems to create, compile, and deploy these neural programs using standard skill management tools.

Researchers introduced Program-as-Weights (PAW), a programming paradigm that compiles natural-language specifications into compact neural artifacts for local execution. This approach addresses the limitations of relying on large language model APIs for tasks like log analysis or data ranking, which often face issues with locality, reproducibility, and high costs. PAW utilizes a 4B compiler trained on FuzzyBench, a new 10M-example dataset, to emit parameter-efficient adapters for a 0.6B Qwen3 interpreter. This configuration matches the performance of prompting a 32B model directly while utilizing roughly one-fiftieth of the inference memory. On a MacBook M3, the system achieves an inference speed of 30 tokens/s.

By reframing the model from a per-input problem solver into a tool builder, PAW allows developers to invoke the compiler once to create a reusable artifact. Subsequent applications of the defined function are locally executed and cost-effective. The system supports various use cases, including an 'Alien Taboo' game, a 3D avatar director, and a website content assistant. Users can integrate these functions into Python workflows using a straightforward compilation command, bypassing the need for external API calls during routine function application. The research team also provided integration paths for coding agents, allowing automated systems to create, compile, and deploy these neural programs using standard skill management tools.