A post on X started the 'rumor mill' about Microsoft using data from Word and Excel to train LLMs
Published: Nov 29, 2024
Key Takeaways:
Microsoft has been accused of using customer data from its Microsoft 365 applications, including Word and Excel, to train its AI models. The controversy arose from the “Connected Experiences” feature, enabled by default in Office apps and allows for Internet-based functionalities like document co-authoring. Critics argue that this feature could be used to scrape user content for AI training without clear disclosure.
Microsoft has strongly denied these allegations, stating that the Connected Experiences feature is not used to train AI models but to enable Internet-required functions. The company emphasized that it does not use customer data from Microsoft 365 applications to train large language models (LLMs). Despite the denial, concerns about privacy and data usage persist among users, highlighting the ongoing tension between innovation and user privacy.
A post by NixCraft on X (formerly Twitter) started the ‘rumor mill’ asserting that “Microsoft Office, like many companies in recent months, has slyly turned on an “opt-out” feature that scrapes your Word and Excel documents to train its internal AI systems. This setting is turned on by default, and you have to manually uncheck a box to opt out.”
The poster went a step further and added a ‘call to action’ of sorts and wrote: “If you are a writer who uses MS Word to write any proprietary content (blog posts, novels, books, or any work you intend to protect with copyright and/or sell), you’re going to want to turn this feature off immediately.”
Microsoft has repeatedly and recently vehemently denied these claims by responding: “These claims are untrue. Microsoft does not use customer data from Microsoft 365 consumer and commercial applications to train foundational large language models.” They also clarified that the applications’ “Connected Experiences” enable co-authoring and cloud storage.
The company has clarified that its AI models are trained on a diverse dataset of publicly available text and code. Microsoft remains dedicated to upholding user privacy and ensuring that customer data is used solely for its intended purpose. This denial comes amidst growing concerns about the potential misuse of user data for AI training, and Microsoft’s statement aims to alleviate these concerns and reaffirm its commitment to data privacy.