This post is also available in: heעברית (Hebrew)

In a shocking reveal made by researchers at Lasso Security, developer platforms like HuggingFace and GitHub that are essential for the development of AI technologies are also leaving top-level organization accounts (like Google, Meta, and Microsoft) exposed to threat actors.

As part of the investigation, Lasso Security inspected hundreds of application protocol interfaces (APIs, which are used to allow applications in computing to ‘talk’ to each other) on both platforms and found LLMs to be particularly vulnerable, especially Meta’s large-language learning model “Llama”.

The researchers stated that their investigation led to the revelation of a significant breach in the supply chain infrastructure, exposing high-profile accounts of Meta. “The ramifications of this breach are far-reaching, as we successfully attained full access, both read and write permissions, to Meta Llama2, BigScience Workshop, and EleutherAI.”

Between all the compromised parties there are models with millions of downloads, all left susceptible to potential exploitation by malicious actors. The researchers say that the gravity of the situation cannot be overstated, explaining that with control over an organization with millions of downloads, the attackers could possess the capability to manipulate existing models and turn them into malicious entities. They conclude by stating that an injection of these corrupted models with malware “could affect millions of users who rely on these foundational models for their applications.”

According to Cybernews, there’s a key vulnerability in the HuggingFace API tokens, which are significant for organizations and their exploitation could lead to data breaches, malicious models spreading, and much more.

Lasso Security’s investigation of the platforms was meant to provide security measures for developers to protect LLMs against potential threats, and after completing it they stated that immediate measures must be taken. “The implications extend beyond mere model manipulation,” state the researchers, adding that preying on the discovered vulnerabilities opens the door to training data poisoning. This means that by tampering with trusted datasets, attackers could compromise the integrity of machine learning models, leading to widespread consequences.

Researchers say they reached out to all concerned parties, who promptly responded by revoking or deleting the exposed API tokens, in many cases on the same day as the disclosure was made.

Going forward, to prevent such occurrences and foresee data poisoning or theft, Lasso Security recommends that tokens used in LLM development be strictly classified, with cybersecurity solutions put in place that are tailored specifically to these models.