ChatGPT Proved to Easily Leak Private Data

Dec 6, 2023

This post is also available in: עברית (Hebrew)

Using OpenAI’s products? Turns out your personal data is not as safe as you believed- Google researchers recently released a study in which they found they could utilize keywords to trick ChatGPT into tapping into and releasing training data that was not intended for disclosure.

The massive growth in ChatGPT usage rests in part on its collection of over 300 billion chunks of data scraped from various online sources, and although OpenAI has taken steps to protect privacy, everyday chats and postings leave a massive pool of data (much of it personal) that is not intended for widespread distribution.

The researchers said in their paper that they were able to extract over 10,000 unique verbatim memorized training examples using only $200 worth of queries to ChatGPT, adding- “Our extrapolation to larger budgets suggests that dedicated adversaries could extract far more data.”

They elaborated that they could obtain names, phone numbers, and addresses of individuals and companies by feeding ChatGPT absurd commands that force a malfunction.

According to Techxplore, the researchers would request that ChatGPT repeat the word “poem” ad infinitum, which forced the model to reach beyond its training procedures and “fall back on its original language modeling objective” and tap into restricted details in its training data. They also reached a similar result by requesting infinite repetition of the word “company,” and managed to retrieve the email address and phone number of an American law firm.

In response to potential unauthorized data disclosures, some companies placed restrictions on employee usage of large language models earlier this year. Rising concerns about data breaches caused OpenAI to add a feature that turns off chat history, adding a layer of protection to sensitive data. The problem is that such data is still retained for 30 days before being permanently deleted.

In conclusion, the researchers termed their findings “worrying” and said their report should serve as “a cautionary tale for those training future models,” warning that users “should not train and deploy LLMs for any privacy-sensitive applications without extreme safeguards.”

ChatGPT Proved to Easily Leak Private Data

Latest

AI’s Double-Edged Impact on the Tech Workforce

Ultrasound Technology Offers Promising New Tool for Battery Safety Testing

AI Models Unintentionally Amplify Chinese State Narratives, New Report Finds

Next-Gen Military Textiles: Smart Uniforms with Embedded Sensors

AI-Enhanced Iranian-Made UAVs Used by Russia in Ukraine Signal Escalating Tech...

Study Finds AI Text Detectors May Unfairly Penalize Non-Native Writers in...

Ministry of Defense is Looking for Dual-Use Startups

Suspected Cyberattack Disrupts Columbia University

Outdated TP-Link Routers Targeted in Active Cyberattacks

Drones Are the New Threat on the Battlefield— These Portable Defenses...

New Study Warns: Overreliance on AI Writing Tools May Weaken Retention

The New Ultra-Light EW Payload for Small Drones

AI-Powered Wearable Enhances Navigation for the Visually Impaired

Pro-Iranian Hackers Leak Sensitive Data from Saudi Arabia’s Largest Sports Event

Israel’s Ministry of Defense is Seeking Dual-Use Startups

China Unveils Compact Laser Weapon That Works in Extreme Temperatures Without...

Unprecedented Data Leak Exposes 16 Billion Login Credentials in Massive Cybersecurity...

Modular Soft Robotics Unlocks Next-Gen Haptics for VR and Rehabilitation

Russia Escalates Electronic Warfare with Widespread GPS Disruptions

Sweating Walls? New Cement-Based Paint Offers Game-Changing Cooling Performance