New Compression Technique Promises More Efficient Large Language Models for Mobile Devices

Nov 19, 2024

This post is also available in: עברית (Hebrew)

Researchers from Princeton and Stanford Engineering have developed a technique to compress large language models (LLMs), a move that could dramatically reduce the cost, energy consumption, and privacy risks associated with running these AI systems. The team’s algorithm, known as CALDERA (Calibration Aware Low precision DEcomposition with low-Rank Adaptation), promises to make LLMs more accessible and efficient, enabling them to run on everyday devices like smartphones and laptops.

LLMs such as ChatGPT have revolutionized our everyday tasks, but their frequent use comes at a cost. They typically require users to send their requests to centralized servers, a process that can be slow, expensive, and energy-intensive. By compressing these massive models, CALDERA aims to reduce the computational resources needed, allowing the models to run locally on devices without sacrificing performance.

According to TechXplore, the novel algorithm works by applying two key techniques: low-precision and low-rank. Low-precision reduces the number of bits used to store data, speeding up processing and improving energy efficiency. Low-rank eliminates redundancies within the model’s weight matrices. By combining these approaches, CALDERA achieves a higher level of compression than either method alone, making LLMs smaller and more efficient while maintaining accuracy.

The researchers tested CALDERA on Meta’s open-source Llama 2 and Llama 3 models and found that it could improve accuracy by up to 5% compared to existing compression techniques. The method also showed promising results in several benchmark tasks, such as logical reasoning and answering questions about physical processes.

The ability to run compressed LLMs locally on devices also offers significant privacy benefits. Users can fine-tune models for specific tasks without sharing sensitive data with third-party servers, reducing the risk of data breaches. However, the team warns that running these models on mobile devices could drain battery life, suggesting that CALDERA is best used in combination with other energy-saving techniques.

As the demand for more efficient AI models grows, CALDERA could be a game-changer, enabling powerful AI applications on personal devices while reducing costs and improving privacy.

New Compression Technique Promises More Efficient Large Language Models for Mobile Devices

Latest

The Shape-Shifting “Octo Robot” Built for Stealth

An AI-Powered Shield Designed for the Drone-Swarm Era

A New Device That Takes the Weight Off Soldiers’ Shoulders

The $2.3 Billion Enigma: Israel’s Strategic Gamble

The Invisible Hand: How Drones and Lasers Are Rewriting the Rules...

When Microrobots Navigate Like Pros — In Labs, Factories, and Beyond

An Air Defense Deal Marks Another Step in Israel’s Defense Exports

Meet the Mini Drone That Flips Like a Bug and Navigates...

A New Mine-Clearing System That Turns Dangerous Ground Into Passable Terrain

The Fighter Helmet That Replaces Night-Vision Goggles Entirely

A New Foldable Drone is Built for Long Flights and Fast...

This New Drone Carries Half a Ton and Lands Almost Anywhere

Fast-Made Capacitors Aim to Supercharge Military Lasers

A New Class of Drone Carrier Takes Shape

A New 3D View of the World Built for Preparedness and...

New Mobile Scanner Detects Brain Damage Minutes After a Blast

Seeing in the Dark: Cameras Get Snake-Like 4K Vision

5G-in-a-Box: A Fast Fix for Disaster and Defense Missions

A Drone Just Proved It Can Outshoot a Jet

The Robot That Changes Shape to Beat Any Terrain