DeepSeek Allegedly Used OpenAI’s Data to Train its Model

Feb 4, 2025

This post is also available in: עברית (Hebrew)

DeepSeek, a China-based AI startup, has raised eyebrows in the AI community with the release of its R1 open-source model, a product that has quickly gained traction in the West. However, the company now faces scrutiny from Microsoft and OpenAI, which are investigating whether DeepSeek breached OpenAI’s terms of service in the development of R1.

The R1 model, which is comparable to OpenAI’s GPT-4o, has made waves due to its impressive performance despite being developed with minimal resources. DeepSeek reportedly spent just $6 million to build R1, a fraction of the hundreds of billions that OpenAI and other Western companies have invested in similar technology. This resource-efficient approach has positioned DeepSeek as a disruptive force in the AI space.

However, OpenAI claims that DeepSeek may have violated its terms of service by illegally using its data to train R1. According to the Financial Times, the company alleges that DeepSeek may have engaged in a technique known as “distillation,” which enhances the performance of smaller models by utilizing the outputs from larger models like OpenAI’s. According to a source at OpenAI, there is evidence suggesting that DeepSeek used OpenAI’s proprietary data without authorization. In a recent Instagram reel, a creator claims to have received replies from DeepSeek where the model identified itself as GPT-4, solidifying OpenAI’s claims.

Further raising concerns, Microsoft security researchers reported that they observed individuals possibly linked to DeepSeek extracting large amounts of data from OpenAI’s API in the past. These activities could suggest that DeepSeek bypassed the restrictions set by OpenAI, allowing them to gather data more freely, according to Bloomberg.

Despite these allegations, DeepSeek’s R1 model continues to perform exceptionally well. As the investigation into DeepSeek unfolds, it’s clear that the battle for AI dominance is intensifying, with new players challenging established giants with innovative and cost-effective approaches.

DeepSeek Allegedly Used OpenAI’s Data to Train its Model

Latest

AI Trained on 10 Million Choices Sheds Light on Human Decision-Making

New Research Suggests Ad Blockers May Be Delivering More Harm Than...

Browser Extensions with Clean Histories Quietly Turn Into Malware

New Semi-Autonomous Underwater System Passes Key Test in Mine Neutralization

Afek Industrial Park in Rosh HaAyin – Israel’s Hub for Security,...

Class-Action Lawsuit Against Amazon Over Alexa Privacy Concerns Underway

Study Reveals Cultural Bias in AI Responses Across Languages

Innovative Process Enhances Lithium-Ion Battery Safety and Performance

Pro-Iran Hacktivist Group Targets Independent News Outlet Iran International

Foam Concrete Enhances Runway Safety in Emergency Landings

Dual Use Startup? This is Your Last Chance to Join the...

Gemini AI Now Embedded Deeper Into Android – Privacy Controls Under...

AI Models Are Now Generating Phishing Links

The Hidden Shift in How AI Learns Language

Researchers Develop Remote-Controlled Beetles for Use in Disaster Zones

AI Now Plays a Major Role in Workplace Decisions, Including Firings...

Critical Security Gaps Discovered in EU Border Surveillance System

New 3D Printing Method Combines Soft and Hard Materials in a...

New Imaging System Lets Robots “See” Inside Boxes Using Millimeter-Wave Signals

North Korean Operatives Used Fake Identities to Infiltrate Blockchain Firms and...