Microsoft AI Research Team Leaks 38TB of Private Data

Microsoft AI Research Team Leaks 38TB of Private Data

image provided by pixabay

This post is also available in: heעברית (Hebrew)

Microsoft’s AI research team has accidentally exposed sensitive private data while publishing open-source training data on GitHub.
“Robust-models-transfer”, which is the name of Microsoft’s GitHub repository, was allegedly configured to grant permissions on the entire account so that not only open-source models but a very large amount of private data was accessible to third parties.
According to Cybernews, the 38TB of additional data that was accidentally exposed included the personal computer backups of Microsoft employees, which contained passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees.
The researchers claim that the attackers could then view all the files in the storage account, delete or overwrite them, saying “An attacker could have injected malicious code into all the AI models in this storage account, and every user who trusts Microsoft’s GitHub repository would’ve been infected by it.”
According to WIX, the storage account wasn’t directly exposed to the public. That is because the Microsoft developers used an Azure mechanism called “SAS tokens,” which allows one to create a shareable link granting access to an Azure Storage account’s data – while upon inspection, the storage account would still seem completely private.
“Shared Access Signature”, or SAS, is a token that grants access to Azure Storage data, with the access level being customizable by the user.
Wix concluded by saying that “besides the risk of accidental exposure, the service’s pitfalls make it an effective tool for attackers seeking to maintain persistency on compromised storage accounts.”
Microsoft has reportedly invalidated the token in July 2023.