Security Testing Reveals Major Vulnerabilities in GPT-5’s Default Configuration

This post is also available in: עברית (Hebrew)

Recent evaluations of OpenAI’s latest large language model, GPT-5, have revealed serious security shortcomings in its unmodified state, raising concerns about its readiness for enterprise deployment. Independent testing by AI security researchers has shown that, without safeguards, the model is highly susceptible to adversarial manipulation.

A series of red-teaming efforts conducted by SPLX, a company focused on AI security, found that the base GPT-5 model—without a system prompt—was vulnerable to 89% of over 1,000 adversarial prompts, achieving a security performance score of only 11%. These tests included prompts designed to trigger policy violations, generate harmful content, or bypass safety filters.

Even with a basic system prompt layer added—a minimal safety configuration intended to shape the model’s behavior—the attack success rate dropped to 43%, indicating some improvement, but still well below acceptable levels for secure use. In comparison, GPT-4o, OpenAI’s earlier model, showed significantly stronger resilience under similar conditions. When hardened, GPT-4o was compromised in just 3% of test cases and maintained a 97% overall performance score.

The GPT-5 model, despite improved reasoning capabilities, failed against relatively simple adversarial logic attacks, including techniques that insert obfuscation into prompts—such as separating characters with hyphens or wrapping prompts in fake encryption puzzles. Researchers also showed the model could be coaxed into producing content related to explosive materials, highlighting potential misuse scenarios.

Experts recommend that organizations avoid deploying GPT-5 in its default form. Instead, security hardening measures, including layered prompt engineering and runtime protections, are essential before enterprise integration. The findings point to persistent challenges in aligning powerful generative models with robust, out-of-the-box safety.

While GPT-5 presents clear advances in baseline performance, its current security posture suggests that more work is needed before it can be considered suitable for sensitive or regulated environments.