Why Blade Servers are the Wrong Choice

When you’re starting a new deployment of physical servers, probably with the intention of using them as virtualization hosts, you are going to have a basic choice: do you go with blade servers or rack servers? This issue is nearly as divisive as putting pineapple on a pizza.

Blade server fans will argue that blades are the only choice. Hardware resellers see large deals. And then there are those of us who have used blades in the past and understand the — how do I say this politely? — the challenges. In this opinion post, I will explain to you why I think blade servers are the wrong choice for your data center. This post is written from the perspective of a Hyper-V engineer, but I think those working with other technologies may find that my reasoning spans hypervisors and operating systems.

What is a Blade Server?

Blade servers were an invention from a time when virtualization was still in its infancy. Deploying physical servers, with one operating system per physical machine, was the norm. As we now know, those machines were underutilized and consumed lots of space. Hardware manufacturers came up with a new solution.

A chassis provides a collection of shared resources including cooling, power, networking and storage connectivity. This chassis is populated with a number of blade servers. The total number of servers contained within the chassis varies depending on the height of the blade or the manufacturer of the solution.

Sponsored Content

Passwords Haven’t Disappeared Yet

123456. Qwerty. Iloveyou. No, these are not exercises for people who are brand new to typing. Shockingly, they are among the most common passwords that end users choose in 2021. Research has found that the average business user must manually type out, or copy/paste, the credentials to 154 websites per month. We repeatedly got one question that surprised us: “Why would I ever trust a third party with control of my network?

For example, a HP C7000 chassis can hold 16 half-height blade servers (aka blades) or 8 full-height blades. A typical 42U rack can contain 4 of these chassis, with a total of 64 (4 x 16) half-height blade servers. This is superior to the maximum of 38 x 1U rack servers plus 2 x 48 port top-of-rack switches that one might get with a traditional deployment.

A blade contains processor and RAM just as you would expect with a traditional rack server. After that things can be quite different. For example, some blade servers are configured to load their operating system from a SAN, while others will have just enough on-board disk capacity for an operating system. There are no traditional network cards or ports on the back of a blade.

Instead, there is a single interface that plugs into a backplane in the blade chassis. This interface provides power from the chassis to the blade, allows network connectivity, and with additional cards added to the blade, can provide additional networking and/or storage connectivity via the blade chassis.

HP ProLiant BL460 G7 blade servers

Racks full of half-height HP ProLiant BL460 G7 blade servers (Source: hp.com)

Storage is often provided by a storage area network (SAN) which the blades connect to via special appliances in the back of the chassis. Alternately, you can insert special storage blades into a chassis, but this all depends on the blade manufacturer.

Connectivity to the physical network is also provided by special switches in the back of the blade chassis. However, blades within a single chassis can communicate with each other by a very fast interconnect on the backplane. This can make Live Migration or vMotion within a chassis very fast.

Why Blade Servers are the Wrong Choice

Now that you have some idea what a blade is, without getting caught up in manufacturer specifics, now I will spend some time telling you why you’d be crazy to invest in this hardware platform.

Blade Server Vendor Lock-In

I like the idea of using a single set of hardware that is tightly tested together. But events in recent times make me think that this testing is more marketing than reality. With rack servers I can choose:

  • Servers from vendor A
  • Network cards from vendor B
  • Storage connection from vendor C
  • Switches from vendor D
  • Storage from vendor E

Compare that to a blade solution:

  • Blade chassis from vendor A
  • Server from vendor A
  • Network cards from vendor A
  • Storage connection from vendor A
  • Switches from vendor A
  • Storage probably from vendor A, but maybe from vendor B

With choice comes the power of negotiation and that leads to better pricing. You might get a nice bid price on your initial installation, but once you’re invested in the ecosystem, you have no choice but to continue spending money with that manufacturer. You are locked in, no matter what the pricing or performance might become.

Blade Server Pricing

I remember when a previous employer was shopping for a new server installation. We were convinced by a manufacturer that blades were cheaper to buy. And you know what? The salesman was right; when we looked at the list price of a blade server it was cheaper than the equivalent 1U rack server. But this was deceptive.

The sales person didn’t stress the cost of the shared components in the chassis, such as the chassis itself, the PSUs, and the connection modules. At that time, the HP Flex 10 virtual connect module cost around $18,000, and you needed 2 of them to provide 10 GbE networking to up to 16 blade servers, each with 2 NICs. If any blade required an additional 2 NICs, then you would have required 4 x virtual connect modules!

If I was installing a rack of hosts now, I’m pretty sure that I could install a pair of 10 GbE or Infiniband switches at the top of a rack for a lot less than the cost of 8 (2 modules per chassis) or 16 (4 modules per chassis) connectivity modules.

Blade Server Flexibility

I like using best-of-breed components. Let me give you an example. If I was building Hyper-V hosts with the intention of using SMB 3.0 storage, then my specification might be as follows for my Scale-Out File Server nodes:

  • Dell R720 with dual Intel processors
  • Chelsio (iWARD/RDMA) or Mellanox (Infiband/RDMA) networking
  • LSI SAS external interfaces

I recently helped a company through the specification process for Hyper-V on SMB 3.0. This company had invested in Cisco UCS blade servers. This solution doesn’t use network cards as we know them. The customer consulted with Cisco and found that they could not use either iWARP or Infiniband networking. They would have to use legacy 10 GbE instead. And when it came to the SOFS nodes, they would have to invest in rack servers, breaking from their standard, to insert supported SAS external interfaces.

That customer could have easily deployed the Infiniband networking that they lusted after if they had installed rack servers, assuming that they had the required free PCI expansion capacity.

Blade Servers and Service Level Agreements

Those who have studied evolution know that the over-specialization of a species can lead to extinction when the environment changes. Most of those who have installed Windows Server 2012 R2 on a blade server have discovered that the vendor lock-in has led to a specialization. Emulex NICs are present in almost every manufacturer’s blade servers. That would be fine if Emulex NICs were stable. However, those who have deployed WS2012 R2 Hyper-V on these servers have discovered bugs on these NICs’ firmware and/or driver.

The reaction of the blade manufacturers and Emulex has been worse than lack or sluggish. It has been atrocious, leaving many customers with unstable installations. Those brands affected include HP and Dell, and I have heard that Hitachi (they sell blade servers too) might also be in the basket case.

In the case of rack servers, we Hyper-V veterans know to avoid all things Broadcom. We can pick and choose components as we see fit.

Blade Servers and Converged Networking

Blade servers brought us the ability to converge networking through devices such as the HP Flex 10 connectivity module. With this device, we could carve up 10 GbE networking into multiple 1 GbE NICs and Fiber Channel over Ethernet (FCOE) SAN connectivity. That sounded pretty cool. Except:

  • Microsoft does not support these kinds of solutions: It’s not written anywhere publicly that I can find, but this is indeed the case as has been confirmed to me and others by networking program managers in the Windows Server group.
  • WS2012 Introduced Software-Defined Converged Networking: We can do converged networking/fabrics using a few PowerShell cmdlets or by using logical networks in System Center Virtual Machine Manager. This doesn’t require $18,000 networking devices and will work with any NIC on the Windows Server hardware compatibility list. I can deploy the same solution across hardware from different manufacturers. And I can easily change the design and QoS implementations after the server/host is deployed.

It’s About the Virtual Machines and Services, Dummy!

The last cry of the blade salesperson is that they can get up to 64 blades into a 42U chassis and I can never do that with rack servers. That is true … to a certain extent. But to be quite honest, I really do not care. And for those of you who do, I’d beg you to leave 2006 when you dreamed of 10:1 virtual machine to host ratios, and join the rest of us in 2014 when we’re focused on virtual machines and services instead of physical servers. You see, I care about how many services I can run in a rack. If I ran a true data center, then I’d care about how many services I could run on a collection of racks, copying the fault domain concept of Microsoft’s Azure.

It’s been a few years since I ran a spreadsheet on this subject, but the last time I did, I found the cheapest Hyper-V host to buy and own (assuming that I could fill it with workloads) over 3 years was a 5U server.  I won’t fit too many of those into a rack, but I guess I wouldn’t need too many switch ports either!

Maybe you go another way and install half-U servers with 10 GbE or Infiniband converged networking. That can give you maybe 80 servers in a rack … assuming that you have the capability to push enough electricity into that rack!

Blade servers were a solution for a time when we needed more lower capacity physical servers with complex connectivity. However, the times have changed. Now we want fewer physical servers, with simpler and customizable storage and networking. Flexibility and cost-effectiveness are critical, and in my opinion, this is why blade servers are the wrong choice.

Editor’s Note: Aidan isn’t alone in this opinion: Petri IT Knowledgebase contributor Brian Suhr presents a similar argument about rack servers over blade servers in a VMware-focused post that asks if traditional rack mount servers are winning back enterprise customers.

Related Topics:


Don't have a login but want to join the conversation? Sign up for a Petri Account

Comments (8)

8 responses to “Why Blade Servers are the Wrong Choice”

  1. Well,

    To be honest I am not convinced. I use HP Blades since 2010 and I love them.

    Le me make some points:

    Blade Server Vendor lock in.

    It is true that you have everything from one Vendor and in some cases this might be beneficial. Did you try to open support call for server with multiple hardware inside of it? It will be classical ping-pong, It is not our hardware, it is vendor B.Vendor B says it is caused by Vendor C.

    With one Vendor and one support number you have one single point of contact and you don’t care if it is branded HP/Brocade FC switch.

    Another word is about working configurations. Have you ever tested configuration with multiple Vendors inside of server? I bet that HP/Cisco/YOu name it tested it much better than you will be ever capable of to do it.

    Blade server pricing.
    Virtual connect – in my current config with Flex-10 I have more than 8 virtual network cards per server in one chassis. Not needed to buy additional VC modules.

    Blade Servers and Converged Networking

    If it is not written anywhere by Microsoft that it is not supported than it is supported. I would not accept such explanation from Vendor.
    Of course you can use fancy scripts and workarounds. Is it easier to support it and use it? I bet no. I am big fan of KISS rule.

    Don’t forget other benefits of blade servers:
    – less cooling required
    – less cabling
    – better rack density
    – less space utilization
    – fewer pdu required

  2. I would disagree as well.

    I use Cisco UCS and anyone who is deploying multiple servers day in and day out or reconfiguring/changing things around quite often will always appreciate the flexibility of UCS. (besides with UCS) I don’t know when the last time I was able to remotely from home add an extra HBA to my rackmount server with a simple server profile change because the latest and greatest vmware admin configuration requires it…with any other rackmount I need to do it the old school way…

    That’s not to say there aren’t times when a blade won’t fit your needs, as in our case we also have some of the C-Series rackmounts for those purposes but they also fit seamlessly into the UCS system. For Power/Space/Cooling and management I would take UCS blades any day over the cumbersome rackmount design.

    I agree that your choices with hardware from vendor a, b, c, can be an advantage but it can also be a disadvantage in it’s own right. Blade systems also help give you access to support programs such as FlexPod or VCE which in their own right are a great way to ensure that you have support and your configurations are solid.

  3. Emulex NICs doesn’t exist or work with Dell 12G blade hardware.

    We have a mix of blades and rack servers but since Windows Server 2012 we’ve stopped purchasing blade servers in favour of rack servers because of some of the reasons stated in this article.

  4. Maybe the problem here is the lack of experience working with blades or dealing with the wrong vendor.
    We have 3 clusters running Hyper-V on our network, 2 of them deployed with rack servers and Windows 2012 and the latest one is running on Dell Blades M820 with Windows 2012 R2, each of those with:
    1TB of RAM
    2 x 300GB HD
    4 x Emulex 8GB HBA cards (we decided to test this, as the Qlogic we used for the past 4 years on the other clusters, sucks).
    And an amazing

    8 x 10GB Broadcome ethernet cards.
    2 x 1GB Intel ethernet cards.
    4 x Intel Xeon E5-4650L (32 cores total)

    This is being feed in terms of networking and SAN connectivity with:
    2 x Brocade M5424 8GB switches
    2 x F10 MXL (10GB switch) “We could have pick Cisco switches here, but the F10 provide us with a better overall connectivity”

    So, I’m seeing there Dell, Brocade, Intel, Broadcom, F10 (we could have added Cisco as well).
    Where is the Vendor lock in?
    BTW as Wojciech said, I have to call just 1 number to have support for all that.

    The only difference I noticed with my clusters running on rack servers and the one running on blades is the latest being more difficult on the initial configuration, after that still have the same limitations for using Hyper-V but I can’t blame the blades for any of the problems as I’m suffering the same things on the rack servers.

    • I’m going to go with lack of experience and/or lack of research.

      We use IBM BladeCenters – a rock-solid platform that holds a good resale value, and which supports over 40% of the top-500 supercomputer clusters (HP = ~25%, Dell = ~6%).

      We’ve deployed Cisco, Brocade, Nortel, and IBM-branded 10GigE and 1GigE switches. There’s also copper pass-through modules available, if we couldn’t handle a choice from the above four Big Name vendors.

      Brocade, Cisco, and QLogic makes fiber channel switches for BladeCenters.

      There’s also a Voltaire infiniband switch, and a couple of IBM branded SAS switch options as well.

      All blade NICs are broadcom, which is the same thing you’ll find in the rackmount systems.

      Support has been rock-solid, even with less than 24/7 support contracts. Best of all, there’s minimal finger-pointing if something breaks; IBM is well prepared to diagnose issues and/or conflicts amongst installed modules. This translates to rapid support turnaround times, and minimal service interruptions.

      Despite all the required interconnects to ensure true dual-path redundancy for storage, power, cooling, networking and OOBM…the cabinets…oh my, our cabinets have such a clean and idiot-proofed look.

  5. We provide support for many companies where hp or Cisco blades are present and never had hardware incompatibility. But… VMware is used as a hypervisor. So maybe the problem is in Hyper-v not the blades itself?

  6. I came to the same conclusions recently. Had old HP C7000 enclosure fully populated, but it was time for some new stuff. We weighed the options and blades are way more expensive until you start filling them up, but even full they’re more money, and less flexible. Went with Dell rack mounts instead and had my choice of vendor’s NIC’s & HBA’s and still only have 1 number to call for support (though obviously not as extensive a selection through Dell, the ones they sell are tested & there should be enough options there to satisfy most needs). And when you’re only running one or two chassis, I dont care how redundant they say it is, there’s still a single point of failure in there somewhere and I dont like that.

  7. Fully agree on this. I did learned this with one incident. We had a recent incident where two nodes started behaving odd. Frequent network drops and sluggishness. Logged a case with HP and had multiple sessions with the support team remotely. Had sent many log files as per their recommendation.

    The first action was to update the server to latest firmware. Since I had two server having the same issue and these servers are in the pre-production stage, I did the firmware update on one server. Issue not fixed and confirmed that issue is not with the firmware.

    Then support team concluded that this may be mostly due to a faulty LOM. They sent a replacement for one server to try. We got the replacement in few hours and the engineer came onsite. We did the replacement – No luck.

    Then I did suggested the onsite engineer to test the connectivity by disconnecting one flex at a time. Since we have two flex and all the network traffic is going out through teamed interface, I was confidently trying out this. After few test, We identified that when ever the traffic was passing through second Flex module, we have issues. Engineer reported this to the support and they sent a replacement for Flex module.

    And the funny thing is before we replaced the Flex module, he asked me if we can try shutting down Flex module and re-inserting it. We tried that. After that, we never faced the issue. Its almost six months this happened.

    FLEX and converged networks are good. However, without having the right support, We took 5 days to identify that the issue is with Flex module. As the Onsite engineer was really helpful and listening to our ideas to test through a different approach, We sorted out. Else, this could run few weeks.

Leave a Reply

Aidan Finn, Microsoft Most Valuable Professional (MVP), has been working in IT since 1996. He has worked as a consultant and administrator for the likes of Innofactor Norway, Amdahl DMR, Fujitsu, Barclays and Hypo Real Estate Bank International where he dealt with large and complex IT infrastructures and MicroWarehouse Ltd. where he worked with Microsoft partners in the small/medium business space.