Home Software Applications This Next-Gen AI Can Break In—But It’s Being Trained to Defend Instead

This Next-Gen AI Can Break In—But It’s Being Trained to Defend Instead

Representational image of OpenAI

This post is also available in: עברית (Hebrew)

Cyber operations are becoming more complex, and the growing role of artificial intelligence is accelerating that shift. Security teams already struggle to handle the scale and speed of modern attacks; the prospect of highly capable AI systems performing tasks traditionally reserved for expert intrusion teams raises the stakes even further. As AI models improve at identifying vulnerabilities, writing exploit code, and navigating complex networks, the question is no longer whether they can assist in offensive cyber activity—but how to prevent their misuse.

Recent performance jumps underscore the pace of change. OpenAI reports that its cybersecurity-oriented models moved from solving 27% of capture-the-flag challenges earlier this year to 76% with a new variant only a few months later. If this trajectory continues, upcoming systems could reach the “high capability” tier described in the company’s Preparedness Framework—models that could, under certain circumstances, generate working zero-day exploits or provide guidance during advanced intrusions.

To address this, the company is adopting a defense-first strategy built around layered safeguards. The focus is on ensuring that increases in capability are matched by equally robust controls on how those capabilities can be accessed, interpreted, and applied. This direction is particularly relevant for defense and homeland security organizations, which face adversaries increasingly empowered by automation, toolkits, and outsourced services. AI systems that reliably assist with defensive workflows—auditing code, evaluating infrastructure, identifying weak points—could meaningfully shift the balance toward defenders.

According to Interesting Engineering, at the system level, the company has introduced stricter access controls, hardened infrastructure, output filtering, continuous monitoring, and internal threat intelligence. New training approaches aim to teach models to recognize and decline harmful requests while still supporting legitimate research and education. Automated tools and human reviewers work together to intervene when behavior appears risky, escalating when necessary.

The company is also investing in ecosystem-wide initiatives. A controlled-access program will allow qualified cyber defenders to use enhanced capabilities under supervision. Aardvark, an AI agent designed to audit full codebases, has already identified novel vulnerabilities and will be offered to select nonprofit open-source teams. Industry-wide coordination is expanding through a Frontier Risk Council and joint threat modeling with other labs.

Taken together, these measures reflect an emerging reality: as frontier AI becomes more technically capable, securing how it is deployed is becoming just as critical as the technology itself.