Chatbots Can Be “Corrupted” and Even Turn Against Other Chatbots

Jan 3, 2024

This post is also available in: עברית (Hebrew)

Researchers from Singapore managed to trick three chatbots- ChatGPT, Google Bard, and Microsoft Bing- into breaking the rules, then turned them against each other.

A research team at the Nanyang Technological University (NTU) in Singapore managed to compromise multiple chatbots that were made to produce content that violates their own guidelines, as was reported by the university. According to Cybernews, this process is known as “jailbreaking,” and consists of hackers exploiting flaws in a software’s system to make it do something that its developers deliberately restricted it from doing.

After “jailbreaking” the chatbots, the researchers then reportedly used a database of prompts that were previously proven to be successful in hacking chatbots to then create a large language model capable of generating further prompts to jailbreak other chatbots.

Liu Yi, co-author of the study, explained: “Training a large language model with jailbreak prompts makes it possible to automate the generation of these prompts, achieving a much higher success rate than existing methods. In effect, we are attacking chatbots by using them against themselves.”

So, despite developers putting restrictions that are made to prevent their chatbots from generating violent, unethical, or criminal content, the AI can still be “outwitted,” as Liu Yang, lead author of the study, puts it.

Yang explains that despite their benefits, AI chatbots remain vulnerable to jailbreak attacks. They can be compromised by malicious actors who abuse vulnerabilities to force chatbots to generate outputs that violate established rules.

Moreover, according to researchers, a jailbreaking large language model can continue adapting and create new jailbreak prompts even after developers patch their models, which essentially allows hackers to beat the developers at their own game with their own tools.

Chatbots Can Be “Corrupted” and Even Turn Against Other Chatbots

Latest

Game-Based Learning Shaping the Future of Online Fraud Awareness

Israel’s Ministry of Defense is Seeking Dual-Use Startups

Iran Doubles Down on Internet Outage by Outlawing Starlink Satellite Service

Iranian Hackers from the 2024 Elections Resume Cyber Attacks

Ukraine Unveils New Drone to Intercept Shahed UAVs

AI Models Unlock the Secrets of Personality Through Language Analysis

AI’s Double-Edged Impact on the Tech Workforce

Ultrasound Technology Offers Promising New Tool for Battery Safety Testing

AI Models Unintentionally Amplify Chinese State Narratives, New Report Finds

Next-Gen Military Textiles: Smart Uniforms with Embedded Sensors

AI-Enhanced Iranian-Made UAVs Used by Russia in Ukraine Signal Escalating Tech...

Study Finds AI Text Detectors May Unfairly Penalize Non-Native Writers in...

Ministry of Defense is Looking for Dual-Use Startups

Suspected Cyberattack Disrupts Columbia University

Outdated TP-Link Routers Targeted in Active Cyberattacks

Drones Are the New Threat on the Battlefield— These Portable Defenses...

New Study Warns: Overreliance on AI Writing Tools May Weaken Retention

The New Ultra-Light EW Payload for Small Drones

AI-Powered Wearable Enhances Navigation for the Visually Impaired

Pro-Iranian Hackers Leak Sensitive Data from Saudi Arabia’s Largest Sports Event