Last Update: Sep 04, 2024 | Published: Dec 28, 2015
Few seem to consider the role of data disks with Azure virtual machines, and fewer ever consider maximum amounts, sizes, or performance limitations. In this article, I will explain the role of data disks and offer some advice on how to size disks for capacity and IOPS.
When you deploy a new Azure virtual machine from the Azure Marketplace, you get a machine that has two disks:
Where do you store data? The temporary disk is a total no-no. And you might look at the mostly unused 127 GB C: drive and think that looks like a good option, but let me cut you off right there; Never store data on the C: drive. Even the best practices for domain controllers on Azure insist that you do not store data on the C: drive. Any veteran Hyper-V administrator who knows their craft will tell you to deploy data disks — it’s not a big deal to do so, and it gives you a more stable platform to build upon.
There are three data disk maximums that you need to consider:
A data disk can be up to 1023 GB. This does not mean that we are limited to volumes of 1023 GB. How big are the disks in your servers? Have you ever been limited to volumes of 300 GB because that was the size of the disks your purchased from Dell or HPE? We can overcome this disk limit to create larger volumes.
Azure caps the maximum performance of an individual data disk, and this limitation depends on the type of disk deployed:
Just as with disks in a physical server, the performance of a volume is not limited by the performance of a single disk. We can build up performance of a volume by aggregating the throughput of multiple disks.
If I can aggregate capacity or performance of data disks to meet the requirements of a data volume, then how many disks can I have? That depends on the spec of virtual machine that you decide to deploy. You can check the maximum number of data disks on Microsoft’s Sizes For Virtual Machine page to determine what an A2 (4 data disks) or a D4 (16) can support. In truth, if you are working with data sets with capacity or performance profiles that exceed the capabilities of a single data disk, then the selection of a virtual machine spec will be partly driven by data disk requirements.
As I have mentioned already, aggregating data disks is something that you have probably already done on physical servers, be they application/data servers or virtualization hosts. Adding more disks offers:
We can use similar techniques with Azure virtual machines to add capacity and performance to data volumes.
We know that most virtual machine specifications support at least two data disks, but usually many more. This allows us to at least double the capacity or performance of a volume stored in data disks.
The method that we will employ is:
Remember that we are not concerned with fault tolerance because this is handled at the fabric layer by Microsoft.
Determining the number of required disks is a simple matter of division. Let’s assume that I need a volume that can support 2,000 IOPS on standard storage. I’ve decided that I’m not using a Basic A-Series virtual machine, so that means my data disks will offer up to 500 IOPS each. 2,000 / 500 = 4, so I need to use four data disks. This means I need to choose a virtual machine spec that can handle at least four data disks.
What if I need 7 TB of Standard Storage? Each disk can offer me up to 1023 GB of storage. We can do some simple math to determine the number of required data disks: (1024 * 7) / 1023 = 7.006. We can round that up for minimum capacity and free space, and we know that we require eight data disks.
Let’s mix it up a bit. I need a 500 GB volume on a D-Series machine on Standard Storage with 8,000 IOPS. A single data disk can offer me more than enough capacity for 500 GB, but it’s limited to 500 IOPS, which is not nearly enough. If we divide 8,000 by 500 we can tell that we need 16 data disks. So I can deploy 16 x 32 GB drives to create a single 512 GB volume. This design will require either a D4 or a D13 virtual machine … it might be a good time to look at Premium Storage to see if a DS-Series machine would be cheaper.
Deploying additional data disks is a simple enough operation. What’s more interesting is how you aggregate those disks.
If you are using Windows Server 2012 (WS2012) or Windows Server 2012 R2 (WS2012 R2) the best solution to use is Storage Spaces:
If you want more than one volume, then you create 1 disk pool for each volume.
In the case that you must deploy a legacy OS, that is Windows Server 2008 R2 (W2008 R2), then you can use the inferior (to Storage Spaces) solution of disk striping on dynamic disks.