Should Azure Stack Be On Limited Hardware?

Microsoft announced that Azure Stack (MAS) would be released in mid-2017 via a set of hardware partners on highly tested and certified systems, naming HPE, Dell, and Lenovo. This means that Microsoft’s new private cloud solution will only be available via a few partners, on very specific hardware sets. Is this a good or a bad thing? I’ll discuss that in this opinion post.

Microsoft Azure Stack

A lot of people were surprised when Microsoft announced a new private cloud solution called Azure Stack at Microsoft Ignite in 2015. Microsoft had already released Windows Azure Pack (WAPack), which was an Azure skin sitting on top of System Center. Administrators and tenants used the WAPack portals, based on the old Azure management portal, and System Center moved the pieces to make things happen. MAS was something different; it doesn’t require System Center, although Microsoft encourages the use of System Center for infrastructure and service management. MAS features the new Azure Portal interface, and it is based on Azure Resource Manager (ARM); the use of these offers an Azure consistent experience, more-so when coupled with the storage and networking capabilities of Windows Server 2016.
MAS created a lot of excitement. Plenty of potential customers looked at MAS and wondered if this was going to be the future of managing Hyper-V deployments, and let them live the promise of 3-clouds-in-one (Azure, hosting partner, and customer site). That was all before Microsoft started to reveal how they would sell MAS to customers through the channel.
Microsoft announced in July that they would be selling MAS exclusively through Lenovo, HPE, and Dell (to start with – others, such as DataON have announced plans for 2017) on a subset of highly tested hardware, mimicking the approach that was taken with CPS Standard. A MAS solution will be a packaged one, deployed quickly as a turnkey installation.
The reaction, as seen by the comments on that announcement post, was somewhat negative. Why would Microsoft do this? It’s not as if CPS has been a big seller – they don’t exactly talk about it much anymore. So what’s the deal?

Speed of Deployment

Let me be clear here; WAPack, which is what CPS was, was a nightmarish mish-mash of pieces that no mortal human would possibly deploy into production in anything less than weeks or months. I expected that I’d never even see a real-world deployment in person, so I ignored WAPack … and I’ve yet to see an installation. I suspect that MAS will be a little better, but the Technical Preview 1 requirements hinted at how big the installation will be when the solution reaches general availability.
One of Microsoft’s arguments for a packaged hardware and software solution is that a system integrator can deliver it, power it up, and have you running in a very short amount of time. Customer feedback, Microsoft claims, is that they want to spend less time deploying infrastructure, and more time deploying and/or selling services. I get that; we’ve all heard of complex and large IT project horror stories that overrun by millions of dollars and seemingly never end.
The counter argument is that the very customers that Microsoft is targeting with MAS, hosting companies and large enterprises, have the IT skills to succeed at deploying complex systems. I’ve worked in hosting and I’ve seen how good those people can be. But I’ve also seen in person how bad IT staff, from management all the way down to the lowly operators, can be in large enterprises. Don’t go thinking that hiring a consulting company can solve issues – far too many consulting companies, from the big international ones to the mom & pop shops, believe that training on the job is OK, and often talk and sell themselves into currents that are stronger than their ability to swim in.

I completely understand Microsoft’s desire to speed up and simplify the deployment of a complex system for customers, and to remove the risks that are inherent in complex projects.

Reliable Systems

Except for bad patches that Microsoft has released, every single problem I have seen in the Microsoft virtualization stack, the foundation of MAS, has been because of poor hardware choices by designers, or defective hardware. I’ve seen all sorts in the world of Hyper-V:

  • Trying to team 14 x 1 GbE NICs per host, instead of using 2 x 10 GbE in a converged network design
  • Running clustered Hyper-V hosts with iSCSI storage using just a single 1 GbE NIC configured per host … in production!
  • Making silly mistakes with CSV design that exceeded the capabilities of the SAN
  • Emulex taking endless months to sort out bugs in their firmware/drivers that badly affected blade server customers

That’s just a sample of the mayhem, and I’ve heard much worse, but I cannot divulge those secrets! This is what has driven Microsoft back into the world of Windows Server 2013 Datacenter edition. Back when Datacenter first came out, you could only purchase it from an OEM, such as Dell or HP, on pre-tested hardware, not just HCL tested, but stress tested to levels beyond the normal certification; this was to ensure that your high-end systems ran smoothly and would be free of hardware issues. Microsoft did the same with Failover Clustering. Back before Windows Server 2008 (W2008), you could only purchase a cluster on a set of pre-tested hardware.
This all sounds fantastic, until you talk to someone who priced up a cluster back in the 2003 era. A HP Windows cluster was made up of a couple of DL380s and a SAN. Sold alone the solution might cost X. Sold as a Windows cluster, it might cost 1.5x. It was the exact same hardware, except it came with an additional sticker to say that samples of these devices were tested for clustering sometime in the past.
This is the world we are going to return to. I feel pretty sure that the handful of top-tier OEMs that are selling MAS will inflate the prices of their hardware.

Restricted Availability

But let’s imagine that at some point at the end of 2017, I want to start learning MAS. How will I do that? It’s only available if I give an OEM, say, $200,000 (I am guessing). That’s a bit much to learn a solution so I can get a job.
Microsoft is going to need partners to hire people that know how to deploy MAS, and customers are going to require people to operate MAS. Where exactly are those people going to come from? Are they expected to become experts by reading some blog posts and watching some videos on the web? I hope Microsoft doesn’t choke the market for their private cloud solution!

What do You Think?

Honestly, I am torn; on one hand I think Microsoft is right to stress quality and speed of deployment. On the other hand, the old school hardware companies have a history of jacking up prices and being rather poor at promoting Microsoft’s on-prem cloud solutions – it’s in the interest of sales people to keep selling what they know (VMware, servers, and SAN). Let us know what you think by commenting below.