New Study Reveals How AI Models “Remember” – And Why It’s Less Than You Think

Jun 17, 2025

This post is also available in: עברית (Hebrew)

A new study by researchers from Meta, Google, Nvidia, and Cornell sheds light on how AI language models actually store and use information—and the findings challenge some common assumptions. Despite their powerful text generation capabilities, large language models (LLMs) store just 3.6 bits of information per parameter, according to the research. That’s barely enough to represent 12 distinct options, far from the memory required to retain exact words or full sentences.

Rather than memorizing content directly, these models operate by learning statistical patterns and reconstructing responses from vast, distributed micro-fragments of data. The result: AI doesn’t retrieve or recall text like a database. Instead, it generates plausible language based on learned correlations between words, concepts, and contexts.

This helps clarify a long-standing debate—whether LLMs simply regurgitate training data. The study’s findings suggest that the answer is mostly no. Words, phrases, and ideas are encoded across countless parameters, meaning any given output is not pulled from memory, but rebuilt from learned structures.

Interestingly, the more data a model is trained on, the less likely it is to retain any one specific piece of information. As the model’s knowledge base grows, individual data points are diluted across a wider network, decreasing the chance of exact memorization. This is a critical detail in current discussions around data privacy and intellectual property rights.

To test memorization, researchers trained models on meaningless, patternless data—effectively forcing them to memorize. Even in these cases, the models couldn’t exceed the 3.6-bit-per-parameter limit, reinforcing the distributed nature of their storage system.

According to Cybernews, this approach to memory has significant implications. For privacy advocates, it means unique or personal data is less likely to be precisely reproduced by large models. For content creators and legal experts, it reframes how derivative or original AI-generated content really is.

Ultimately, the study deepens public understanding of how generative AI systems function—and may help shape future regulation, safety standards, and ethical guidelines for AI training and deployment.

New Study Reveals How AI Models “Remember” – And Why It’s Less Than You Think

Latest

Office Rent in Tel Aviv Is 2.5 Times Higher Than in...

Cyber Front Heats Up Following Israel-Iran Conflict

Dual-Band Satellite Antenna Ushers in a New Era of Resilient Military...

North Korea to Begin Producing Shahed UAVs with Russian Support

Cybercriminals Use Fake AI Tools to Deploy Malware at Scale

BAE and Hanwha Join Forces on Next-Gen SAR-RF Satellite ISR Capabilities

Rafael Unveils Iron Beam 450: Next-Gen Laser Defense System with Extended...

Ministry of Defense Call for Dual-Use Startups

AI-Driven Phishing Attacks Are Outsmarting Employees

Invisible Eavesdropping: How Laptop Mics Could Be Leaking Your Conversations

Robotic Touch: “Electronic Skin” Brings Robots Closer to Human Sensation

FBI Warns of Malware Preinstalled on Low-Cost Android Devices

Dissolvable Battery Designed to Vanish After Use

Ukrainian Intelligence Uncovers Use of Western Tech in Russian Attack UAV

New Long-Range Surveillance Radar Offers Border Monitoring at Ultra-Low Power

INNOFENSE Call for startups – Dual-Use Technologies

China Accelerates AI Infrastructure Buildout—On Earth and in Orbit

New Method Exposes When AI Explanations Are Misleading

WhatsApp Trials AI Chatbot Creation Tool for Android Users

Thousands of Asus Routers Compromised in Stealth Botnet Campaign