New Compression Technique Promises More Efficient Large Language Models for Mobile Devices

Image by Unsplash

This post is also available in: עברית (Hebrew)

Researchers from Princeton and Stanford Engineering have developed a technique to compress large language models (LLMs), a move that could dramatically reduce the cost, energy consumption, and privacy risks associated with running these AI systems. The team’s algorithm, known as CALDERA (Calibration Aware Low precision DEcomposition with low-Rank Adaptation), promises to make LLMs more accessible and efficient, enabling them to run on everyday devices like smartphones and laptops.

LLMs such as ChatGPT have revolutionized our everyday tasks, but their frequent use comes at a cost. They typically require users to send their requests to centralized servers, a process that can be slow, expensive, and energy-intensive. By compressing these massive models, CALDERA aims to reduce the computational resources needed, allowing the models to run locally on devices without sacrificing performance.

According to TechXplore, the novel algorithm works by applying two key techniques: low-precision and low-rank. Low-precision reduces the number of bits used to store data, speeding up processing and improving energy efficiency. Low-rank eliminates redundancies within the model’s weight matrices. By combining these approaches, CALDERA achieves a higher level of compression than either method alone, making LLMs smaller and more efficient while maintaining accuracy.

The researchers tested CALDERA on Meta’s open-source Llama 2 and Llama 3 models and found that it could improve accuracy by up to 5% compared to existing compression techniques. The method also showed promising results in several benchmark tasks, such as logical reasoning and answering questions about physical processes.

The ability to run compressed LLMs locally on devices also offers significant privacy benefits. Users can fine-tune models for specific tasks without sharing sensitive data with third-party servers, reducing the risk of data breaches. However, the team warns that running these models on mobile devices could drain battery life, suggesting that CALDERA is best used in combination with other energy-saving techniques.

As the demand for more efficient AI models grows, CALDERA could be a game-changer, enabling powerful AI applications on personal devices while reducing costs and improving privacy.