Azure recently experienced a nasty outage, lasting several hours. It seems to have been centered around DNS, but had wide side effects.
Microsoft’s cloud platform took something of a beating. By the sound of it, a DDoS could have been to blame. Or not.
Redmond isn’t saying. In today’s IT Newspro, we avoid the potholes in the cloudy superhighway.
What’s the craic? Mary Jo Foley speaks to a Global DNS outage:
The status page said the outage started at…7:48 am ET. [By] 11 am ET [the] page is reporting that most of the downed services are back, if not coming back.
Microsoft is identifying the preliminary root cause as a “spike in networking traffic.” … The DNS issues were “self-healed by the Azure platform.”
[But by] 4 pm ET, Microsoft still seems to be having OneDrive connectivity issues.
That doesn’t sounds so good. Alexander J Martin dreams up a colorful metaphor—Azure is on fire:
Customers using Azure DNS in multiple regions experienced difficulties…due to the mysterious issues affecting Microsoft’s cloud computing and infrastructure platform.. … Azure proudly advertises itself as a global network…using Anycast routing to provide “outstanding performance and availability.”
Where did the problem start? Peter Gothard has a hint—Azure borkage in central US leads to global woes:
The fault affected API management, web apps, Service Bus and SQL database services in the central US region. … Customers noticed confusion with Microsoft’s messages, as Azure Twitter feeds and status pages seemed to disagree on the speed of recovery.
Is this part of a pattern? rufflow wishes Microsoft would quit fiddling with it:
Microsoft and reliability these days. … Login issues with Visual Studio team services…part of Skype offline, then SharePoint issues, then OneDrive issues…
Every time I logon to my Office 365 admin account there are new features, layouts or things are moved. It would be nice if they rolled out new stuff fully instead of living in a SharePoint house that is in constant renovation.
Worrisome, no? Aspecially as hobblegum says this shouldn’t happen:
The “datacenter pair” design is supposed to prevent this. They are not supposed to update the code or configs in both centers at the same time, to reducing the risk for both locations for a geo-redundant solution to go down at the same time. These multi-region outages shows that the regions are still too tightly coupled.
Thankfully for Microsoft, Google also had issues this week. Here’s Caroline Donnelly, with Microsoft and Google cloud users suffer service outages:
Rival service providers both experience technical difficulties. … Google Apps for Work users [were] unable to use the service for 90 minutes.
I bet this will make great fodder for the anti-cloud Luddites. Take ma1010, for example:
Someday perhaps I’ll understand why people and businesses want to put their own data on computers that belong to some corporation in some distant location which depend on the Internet to work at all.
If something breaks on your in-house IT and you’re the IT guy, you can do something about fixing it. If something breaks in the cloud, all you can do is whine about it. And wait.
Meanwhile, life goes on. Last word goes to juhunter:
Well this throws a wrench in trying to convince my boss how awesome Azure is.
More great links from Petri, IT Unity, Thurrott and abroad:
Main image credit: Federal Highway Administration (public domain)