Anyone running on-premises servers received a real headache when the Meltdown and Spectre vulnerabilities became known. Careful sizing plans, disaster recovery configurations, and datacenter failover arrangements all went out the door with the realization that applying the fixes released by CPU and O/S vendors could impact server performance. It wasn’t a nice period.
Since the original reports of the vulnerabilities and the first release of patches (albeit including some buggy BIOS fixes), we have a better picture of the impact. The bad news is that software vendors are unable to be anyway precise about the effect that any specific customer can expect after applying the mitigations.
You might criticize advice like that released by Microsoft for Exchange because of the fuzziness of statements like “Exchange Server is one of those workloads that may experience a significant decrease if KVAS is enabled,” but this is all they can really say.
Every customer environment is different in terms of server hardware, BIOS levels, patches, software revisions, and mix of Microsoft and third-party software running on the servers. Throw in variations in CPU and I/O load and you have an impossible matrix of possibilities. The advice of testing in a lab environment before deploying fixes into production is both sensible and the only possible course. (For more on this topic, read Paul Cunningham’s article).
It is not just on-premises Exchange servers that incur a performance penalty. Every Windows server running today might be affected, including SharePoint farms, hypervisor hosts, and front-end web servers. In short, it’s a mess.
One thing this situation did for me is confirm my strongly held view that essential servers like mailbox servers deserve the comfort of real hardware. Virtualization is great technology that is well suited for other purposes, but Meltdown underscores the added complexity that a hypervisor brings to operations. It’s just another layer to worry about, patch, and manage that we can do without when problems arise.
Perhaps for this reason, Microsoft does not use virtual servers for Office 365. To simply operations, Microsoft uses physical servers and reinstalls software from scratch at regular intervals to refresh their configurations. Even so, Microsoft would have cope with a potentially huge performance impact rippling across its Office 365 datacenters if it applies the mitigations to the millions of servers that support Exchange Online (and Outlook.com), SharePoint Online, Teams, Yammer, OneDrive for Business, and all the other apps running in the service.
I do not know what Microsoft has done to protect the Office 365 infrastructure, but their task is somewhat easier than for those running on-premises environments. First, cloud servers are created using a cookie-cutter pattern with little or no variation. Second, Microsoft refreshes server configurations regularly so that servers run up-to-date software. These factors make it easier to deploy fixes and be sure that the fixes are in place.
Third, Microsoft knows exactly what software runs on each server. There is none of the customized configurations you find in an on-premises environment as little or no third-party products run inside Office 365. Reducing the number of programs running on a server and eliminating programs that come from other vendors means that the potential attack surface is smaller.
Finally, to exploit the Meltdown vulnerabilities, an attacker must first penetrate the external defences to get inside Office 365 and access data held on servers. Given the investment Microsoft has made in security and the record they have in protecting their datacenters, it seems that easier targets exist for attackers to probe. In this respect, although Meltdown allows rogue programs to access data belonging to other programs and the operating system, and could potentially be exploited on something like an Exchange Online mailbox server that hosts databases for several tenants, perhaps data is safer in the cloud than it is on-premises.
Given the size of the Office 365 infrastructure and the points made above, it could be that Microsoft made the business decision not to deploy any mitigation for Meltdown inside Office 365 until they have more information about the performance ramifications. After all, would you rush to update over a million servers? Waiting might be a very smart decision – and one that does not seem to put Office 365 customers at risk. I’m certainly not worried.
I might be plain wrong about how Meltdown affects the Office 365 infrastructure, but I bet that Office 365 tenants are glad that it is Microsoft that must figure out how to maintain performance and security across the service. Transferring responsibility for analyzing, understanding, and responding to threat over to Microsoft
Follow Tony on Twitter @12Knocksinna.
Want to know more about how to manage Office 365? Find what you need to know in “Office 365 for IT Pros”, the most comprehensive eBook covering all aspects of Office 365. Available in PDF and EPUB formats (suitable for iBooks) or for Amazon Kindle.