Inside the British Lab Hunting for AI Dangers
- •British AI Security Institute researchers successfully coaxed an AI chatbot into generating instructions for creating a biological weapon.
- •The government-backed institute, funded with £360 million, has identified severe safety vulnerabilities in major models including Claude and Gemini.
- •The institute is serving as a global blueprint for governments seeking to conduct independent safety vetting of advanced AI technologies.
British government experts at the AI Security Institute in London are testing leading artificial intelligence models for catastrophic risks, including the generation of instructions for bioweapons and the execution of complex cyberattacks. In one recent test, a team led by computer scientist Xander Davies successfully prompted an AI chatbot to provide a step-by-step recipe for making anthrax after bombarding the system with thousands of automated questions. The institute, which employs roughly 100 people from intelligence agencies, academia, and the tech sector, has identified major safety vulnerabilities in every major model tested, including Google's Gemini and Anthropic's Claude.
Established in 2023 following a summit at Bletchley Park, the institute is funded by £360 million (approximately US$480 million) from the British government. Its testing methodology is becoming a global model, with similar organizations emerging in countries including Japan, Singapore, France, India, Canada, and Australia. The institute holds the unique position of being the only non-American government entity granted access to test Anthropic's Mythos model for cybersecurity flaws prior to its release in April. Despite these efforts, the organization lacks regulatory power and remains constrained by limited access to internal model training data.
The institute focuses on critical threats, such as chemical and biological weapons, cyber-offensive capabilities, and the manipulation of human behavior. Recent findings indicate that some AI models can complete complex, 32-step corporate network attacks significantly faster than a skilled human hacker, who might typically require 20 hours for such a task. Additionally, researchers are investigating whether models possess deceptive capabilities or awareness by recognizing when they are being subjected to safety testing. While the institute struggles to compete with the multimillion-dollar salaries offered by private AI companies, leadership sees the mission as an essential "tour of duty" to ensure democratic institutions can effectively address the rapid technological acceleration identified by industry leaders Sam Altman, Dario Amodei, and Demis Hassabis in 2023.