Key Takeaways:
Microsoft has notified its enterprise customers about a major incident involving the loss of over two weeks’ worth of security logs for several of its cloud services. The company confirmed that this bug impacted key Microsoft security products, including Microsoft Entra and Sentinel.
Logging involves recording events, activities, and transactions within a system, such as account sign-ins or failed login attempts. These logs are important for enterprise admins to investigate security incidents and analyze system performance. When logs are missing, it becomes difficult for customers to track security breaches, and unauthorized access, or diagnose and resolve performance issues.
Earlier this month, Microsoft notified affected customers that a bug in its internal monitoring agents led to inconsistent log data collection between September 2 and September 19. The company has confirmed that there is no evidence that the issue was caused by a security breach.
“A bug in one of Microsoft’s internal monitoring agents resulted in a malfunction in some of the agents when uploading log data to our internal logging platform. This resulted in partially incomplete log data for the affected Microsoft services. This issue did not impact the uptime of any customer-facing services or resources – it only affected the collection of log events. Additionally, this issue is not related to any security compromise,” Microsoft explained.
Microsoft first detected the issue on September 5 and implemented a temporary workaround by periodically rebooting the agent or server to restore the log collection process. While this fix improved the completeness of the logs, some users may have noticed delays in log delivery or increased latency.
According to Microsoft, the logging outage affected a subset of its cloud services, including Microsoft Entra, Sentinel, Defender for Cloud, Azure Virtual Desktop, and Microsoft Purview. As a result, customers may have missed critical security log entries or events during the affected period.
In a statement to TechCrunch, Microsoft confirmed that it has deployed a fix to address the logging outage across all affected cloud services. “We have mitigated the issue by rolling back a service change. We have communicated to all impacted customers and will provide support as needed,” said Microsoft’s Corporate Vice President John Sheehan.
To prevent similar incidents in the future, Microsoft has taken several steps, including updating its monitoring agent. The company also plans to introduce operational health monitoring and address gaps in the end-to-end testing of its logging platform next month.
Microsoft also notes that all critical services that collect and provide log data will transition to a centralized system for monitoring and managing this information. This change is set to be rolled out to enterprise customers in November.