Microsoft AI researchers unintentionally exposed vast amounts of sensitive data, including private keys and passwords, when they published a public training data bucket on GitHub.
Discovery by Wiz
Cloud security firm Wiz stumbled upon a GitHub repository owned by Microsoft’s artificial intelligence division while investigating accidental cloud data exposure. The repository contained open-source code and AI models for image recognition, with users asked to download the models from an Azure Storage URL. However, Wiz discovered that this URL granted access to the entire storage, leading to the inadvertent exposure of additional private data.
The exposed data encompassed 38 terabytes of confidential information, including personal backup copies of two Microsoft employees’ personal computers. Additionally, the vault contained sensitive personal data such as Microsoft service passwords, private keys, and over 30,000 internal Microsoft Teams messages from hundreds of employees.
Misconfiguration and SAS Tokens
Wiz determined that the URL used to store this data in 2020 was incorrectly configured, granting full control rights instead of read-only access. This misconfiguration allowed potential unauthorized access, including the ability to delete, replace, or insert malicious content. The access was facilitated through a shared access signature (SAS) permission token included in the URL, a mechanism commonly used in Azure for providing access to storage account data.
Mitigation and Impact Assessment
Wiz promptly reported its findings to Microsoft on June 22, leading to the revocation of the SAS token on June 24. Microsoft concluded its investigation into the potential organizational impact on August 16.
According to Microsoft’s Security Response Center, no customer data was exposed, and no other internal services were compromised due to this issue. Microsoft also noted that Wiz’s research has expanded GitHub’s Secret Spanning service, which monitors public open-source changes for credentials and other plaintext secrets, including SAS tokens with overly permissive expiration dates or privileges.
In a statement, Wiz’s co-founder and CTO, Ami Luttwak, highlighted the challenges of handling vast amounts of data in AI development and the importance of stringent security measures, particularly when collaborating on open-source projects. The incident serves as a reminder of the need for vigilance in safeguarding sensitive data in the era of AI, notes NIX SOLUTIONS.