AIのサイバー攻撃能力を検証:英AISIがGPT-5.5の性能を評価
- •英AISIがGPT-5.5のサイバー攻撃能力を評価し、複雑なネットワーク侵入を完遂することを確認
- •GPT-5.5は32段階の企業ネットワーク攻撃シミュレーションで10回中2回成功を達成
- •現時点で産業制御システム攻撃シミュレーションを完遂可能なモデルは存在しない
The rapid advancement of frontier AI models has brought both unprecedented productivity gains and significant security concerns. A critical new report from the UK’s AI Security Institute (AISI) underscores this dual reality, detailing the cyber-attack capabilities of OpenAI’s latest flagship model, GPT-5.5. By utilizing advanced cyber-range environments, researchers are beginning to quantify just how capable these systems are at executing complex, multi-stage cyber-attacks that were previously the exclusive domain of human professionals.
The AISI assessment centered on a grueling 32-step simulation dubbed "The Last Ones," a virtual environment designed to mimic a corporate network takeover. This isn't a simple test of coding skills; it involves reconnaissance, credential theft, lateral movement between Active Directory forests, and even navigating CI/CD supply chains. The fact that GPT-5.5, along with Anthropic’s Claude Mythos Preview, successfully completed this end-to-end simulation represents a watershed moment for AI agentic capabilities. It suggests that these models have evolved beyond mere chatbots into sophisticated agents capable of autonomous, long-horizon task planning.
While the results are undeniably impressive from a technical standpoint, the AISI was careful to contextualize these findings. The simulation took place within a controlled, research-oriented sandbox, free from the active, real-time defenses, network monitoring tools, and sophisticated threat hunters one would encounter in a live, protected enterprise environment. The models, while highly skilled at logical execution, have not yet demonstrated the ability to evade active cyber defenses or industrial-grade security operations. Furthermore, the models notably failed in a separate simulation targeting industrial control systems, suggesting that specialized knowledge in operational technology (OT) remains a significant barrier for current-generation LLMs.
Perhaps most intriguing is the observed trend that model performance continues to scale with computational investment. The AISI noted that they have yet to see a performance plateau; as models grow in intelligence and autonomous reasoning, their capacity for offensive cyber operations increases as a secondary effect. This is not necessarily an intentional design feature, but rather an emergent capability tied to better coding, logic, and multi-step reasoning skills. As these capabilities become more widespread, the focus for policymakers and corporate security teams is shifting from prohibition to adaptation.
The implications of this report extend far beyond the laboratory. As these "frontier" models become more accessible, the same capabilities that allow them to perform sophisticated cyber-attacks can, and likely will, be harnessed by security professionals for defensive purposes. Red teaming, automated penetration testing, and rapid vulnerability mitigation are about to get a massive upgrade. The AISI, working in conjunction with bodies like the National Cyber Security Centre, is emphasizing the need for organizations to prepare for a world where AI-powered threats are a standard operational reality, necessitating a fundamental rethink of cybersecurity postures and digital infrastructure hygiene.
Ultimately, the AISI assessment serves as a reality check for both the AI industry and the broader business community. We are entering an era where AI-driven cyber-attack tools are no longer speculative fiction, but tangible, quantifiable threats. While current models like GPT-5.5 have specific limitations, the trajectory of their development is clear. For university students and aspiring technologists, the lesson is paramount: understanding the intersection of AI safety, autonomous agency, and offensive security will be one of the most critical skill sets of the coming decade.