In part one of this series, I discussed how Microsoft (or you) could use SANs or Storage Spaces with JBODs to implement a Scale-Out File Server (SOFS). In this second part, I will look at disks, networking, servers, and the concept of Storage Bricks in the decision making process of designing a SOFS.
When Storage Spaces was released with Windows Server 2012, all I ever heard from people I briefed was “Does it support tiered storage?” It didn’t then, but Windows Server 2012 R2 does now. The SSDs are not the kind you put in a PC or laptop, not even the “enterprise” kind. A SOFS requires dual-channel SAS disks, and the kind of SSDs that support those don’t cost hundreds of dollars – they cost thousands of dollars, just like the ones in a SAN do.
In the small/medium enterprise space, I don’t expect to see many tiered Storage Spaces implementations. Instead, I expect to see lots of two-way mirroring with 10K or 15K HDD traditional spinning disks. Companies of this size rarely need the peak performance offered by SSDs.
Where tiered storage really can be of benefit is when you need lots of capacity. Consider this: You can put in 200 x 1 TB 10K drives at a medium price, or you can put in 50 x 4 TB 7.2K drives. Those 50 slower drives will be a lot cheaper to purchase and power. What if we supplemented those 50 drives with 8 x SSDs? The heap map optimization process of Storage Spaces would promote the hot blocks to the SSD tier, and we would have Write-Back Cache to absorb brief spikes in write activity. That’s what Microsoft envisioned with tiered Storage Spaces: to mix the best of big, slow, and economical with small, fast, and expensive.
Some JBOD manufacturers will sell you disks and test them for you. Not all disks are made equal, however, and things happen during shipping. Test the disks before you implement your Storage Spaces. It’s also a good idea to check the firmware level on the disks. This can impact performance, especially with Windows Server MPIO, where the Round Robin (RR) algorithm has been found by some of us to perform poorly compared to the Failover Only (FOO) one.
Although you can do SMB 3.0 over 1 GbE networking, I never see Microsoft talk about this. They don’t appear to consider this speed of networking for their data center storage protocol when blogging, writing, or presenting. The minimum you will normally see is unenhanced 10 GbE networking. Most real-world implementations (see the Build team whitepaper above) are in this speed.
A SOFS with lots of connecting Hyper-V hosts will place pressure on the CPUs of those SOFS nodes. Don’t forget that the SMB 3.0 networking of the hosts could consume precious CPU from the hosts! Remote Direct Memory Access (RDMA) offloads most of that processing to alleviate the pressure that could have been caused by SMB 3.0 storage networking. There are three options:
Chelsio is the only brand still making iWARP NICs. Mellanox make NICs for RoCE and InfiniBand, with chipsets appearing in various server brands as add-on cards. It is expected that people implement RDMA will go for either iWARP or Infiniband. RoCE is not an option for those who want a simple, or even just a difficult, way to go. And my impression is that InfiniBand is not much more expensive than RoCE. In the presentations that I have attended, the networking is always either plan 10 GbE, iWARP, or Infiniband.
As for the server networking, that’s a great big old “it depends.” I have seen two designs in Microsoft presentations. The first of those designs is one that I first blogged about on my own site in June of last year. In that design, a pair of rNICs (NICs capable of RDMA) is placed into the SOFS cluster nodes and Hyper-V hosts. Some converged networking is used to merge cluster and SMB 3.0 networking on those NICs. Management is done on the physical NICs.
A second option, and one I’m not sold on, is to use converged virtual NICs for the SMB 3.0 storage communications. There are a few considerations with this option:
I have heard a possibility of using more than just two virtual NICs to get more parallel channels of networking for SMB Multichannel (single channel across multiple virtual NICs). I really hate that idea in the real world because it adds a lot of complexity.
I recommend keeping the network design simple for SMB 3.0 storage and using the design that I blogged about last year, whether you are using RDMA or not with 10 GbE or faster networking.
Microsoft always tries to remain neutral when it comes to servers. My suggestion is that you always use 2U workhorse servers. This is because they are easily maintained and expanded. Note that you might need many expansion cards:
Make sure that the expansion cards meet the speed requirements of the cards.
Regarding memory, note that you cannot use CSV Cache with tiered storage spaces, so there’s no point in loading the SOFS nodes with RAM if you are going to mix SSDs with HDDs. On the other hand, if you are using just a single tier, then add some RAM to improve the read performance of the SOFS’s CSVs with CSV Cache.
It is possible to build a single SOFS cluster that contains isolated storage footprints. For example, we could build a SOFS cluster using an existing SAN and some servers. Then we could add some more servers, but instead of connecting them to the SAN, we would add some JBOD trays. It would be a single cluster, a single active/active (actually, active/active/active/active) File Server for Application Data role or namespace, but with CSVs and file shares that do not span storage systems. The below diagram depicts how we could implement such a design with two of these “storage bricks.”
Remember that in this design, the SOFS cluster is still restricted to eight nodes (a support limitation by Microsoft). That implies that you could have 4 x 2-node storage bricks or 2 x 4-node storage bricks in a single cluster.