Coming Soon: GET:IT Endpoint Management 1-Day Conference on September 28th at 9:30 AM ET Coming Soon: GET:IT Endpoint Management 1-Day Conference on September 28th at 9:30 AM ET
Microsoft Azure

The Role of Azure Virtual Machine Data Disks, Plus Tips on Sizing Disks for Capacity

Cloud Hero Azure

Few seem to consider the role of data disks with Azure virtual machines, and fewer ever consider maximum amounts, sizes, or performance limitations. In this article, I will explain the role of data disks and offer some advice on how to size disks for capacity and IOPS.

The Role of Azure Virtual Machine Data Disks

When you deploy a new Azure virtual machine from the Azure Marketplace, you get a machine that has two disks:

  • OS disk: Windows machines get a 127 GB C: drive.
  • Temporary disk: This is a variable size disk, illustrated rather confusingly as “disk size” on the pricing site, that is for non-persistent data. The drive is local on the Azure host and is not guaranteed to be there over time. On Windows machines, the paging file is on this D: drive and some DBAs might choose to store TempDB caching databases there (see D- and G-Series virtual machines).

Where do you store data? The temporary disk is a total no-no. And you might look at the mostly unused 127 GB C: drive and think that looks like a good option, but let me cut you off right there; Never store data on the C: drive. Even the best practices for domain controllers on Azure insist that you do not store data on the C: drive. Any veteran Hyper-V administrator who knows their craft will tell you to deploy data disks — it’s not a big deal to do so, and it gives you a more stable platform to build upon.

Sponsored Content

Say Goodbye to Traditional PC Lifecycle Management

Traditional IT tools, including Microsoft SCCM, Ghost Solution Suite, and KACE, often require considerable custom configurations by T3 technicians (an expensive and often elusive IT resource) to enable management of a hybrid onsite + remote workforce. In many cases, even with the best resources, organizations are finding that these on-premise tools simply cannot support remote endpoints consistently and reliably due to infrastructure limitations.

Data Disk Maximums

There are three data disk maximums that you need to consider:

  • Maximum size: How big can a data disk be?
  • Maximum performance: How fast will the data disk be?
  • Maximum number: How many data disks can a virtual machine spec support?

Maximum Size

A data disk can be up to 1023 GB. This does not mean that we are limited to volumes of 1023 GB. How big are the disks in your servers? Have you ever been limited to volumes of 300 GB because that was the size of the disks your purchased from Dell or HPE? We can overcome this disk limit to create larger volumes.

Maximum Performance

Azure caps the maximum performance of an individual data disk, and this limitation depends on the type of disk deployed:

  • Basic A-Series VM: A data disk on this kind of virtual machine is limited to 300 IOPS.
  • All of Standard Storage: If you deploy a HDD-based data disk on any other specification of virtual machine, then it is limited to 500 IOPS.
  • Premium Storage: SSD-based storage is rated at MB/second and the performance limit is dictated by the size of the data disk – is it a P10, P20, or P30? Also note that the spec of the DS- or GS-Series virtual machine might support a lower MB/second transfer rate than the data disks.

Just as with disks in a physical server, the performance of a volume is not limited by the performance of a single disk. We can build up performance of a volume by aggregating the throughput of multiple disks.

Maximum Number

If I can aggregate capacity or performance of data disks to meet the requirements of a data volume, then how many disks can I have? That depends on the spec of virtual machine that you decide to deploy. You can check the maximum number of data disks on Microsoft’s Sizes For Virtual Machine page to determine what an A2 (4 data disks) or a D4 (16) can support. In truth, if you are working with data sets with capacity or performance profiles that exceed the capabilities of a single data disk, then the selection of a virtual machine spec will be partly driven by data disk requirements.

Aggregating Disks

As I have mentioned already, aggregating data disks is something that you have probably already done on physical servers, be they application/data servers or virtualization hosts. Adding more disks offers:

  • Fault tolerance: We’re not so concerned with this because we get fault tolerance from the Azure fabric via disk resiliency and replication … hopefully supplemented by Azure Backup for IaaS VMs.
  • More capacity: Adding more physical disks increases capacity. With 3 x 300 GB disks, I could deploy a 600 GB volume on RAID5 or a 900 GB volume on RAID0.
  • More performance: Let’s say that I am using 300 GB SAS disks with 200 IOPS each. I need to deploy a database log file that can support 2,000 IOPS. To meet that performance requirement, I must install at least a RAID0 LUN with 10 (10 * 200 IOPS = 2,000 IOPS) disks.

We can use similar techniques with Azure virtual machines to add capacity and performance to data volumes.

Aggregating Disks in Azure

We know that most virtual machine specifications support at least two data disks, but usually many more. This allows us to at least double the capacity or performance of a volume stored in data disks.

The method that we will employ is:

  1. Calculate how many disks are required.
  2. Deploy the number of data disks.
  3. Deploy a disk aggregation technology within the guest OS of the virtual machine.

Remember that we are not concerned with fault tolerance because this is handled at the fabric layer by Microsoft.

How Many Disks Are Required?

Determining the number of required disks is a simple matter of division. Let’s assume that I need a volume that can support 2,000 IOPS on standard storage. I’ve decided that I’m not using a Basic A-Series virtual machine, so that means my data disks will offer up to 500 IOPS each. 2,000 / 500 = 4, so I need to use four data disks. This means I need to choose a virtual machine spec that can handle at least four data disks.

What if I need 7 TB of Standard Storage? Each disk can offer me up to 1023 GB of storage. We can do some simple math to determine the number of required data disks: (1024 * 7) / 1023 = 7.006. We can round that up for minimum capacity and free space, and we know that we require eight data disks.

Let’s mix it up a bit. I need a 500 GB volume on a D-Series machine on Standard Storage with 8,000 IOPS. A single data disk can offer me more than enough capacity for 500 GB, but it’s limited to 500 IOPS, which is not nearly enough. If we divide 8,000 by 500 we can tell that we need 16 data disks. So I can deploy 16 x 32 GB drives to create a single 512 GB volume. This design will require either a D4 or a D13 virtual machine … it might be a good time to look at Premium Storage to see if a DS-Series machine would be cheaper.

How to Aggregate Data Disks?

Deploying additional data disks is a simple enough operation. What’s more interesting is how you aggregate those disks.

If you are using Windows Server 2012 (WS2012) or Windows Server 2012 R2 (WS2012 R2) the best solution to use is Storage Spaces:

  1. Create a disk pool from the raw data disks.
  2. Create a virtual disk, with one column per data disk, no fault tolerance, and an interleave appropriate for the data, using all of the capacity of the disk pool.
  3. Format the virtual disk with a single volume that has an allocation unit size appropriate for the data.

If you want more than one volume, then you create 1 disk pool for each volume.

In the case that you must deploy a legacy OS, that is Windows Server 2008 R2 (W2008 R2), then you can use the inferior (to Storage Spaces) solution of disk striping on dynamic disks.

Related Topics:

BECOME A PETRI MEMBER:

Don't have a login but want to join the conversation? Sign up for a Petri Account

Register
Comments (0)

Leave a Reply

Aidan Finn, Microsoft Most Valuable Professional (MVP), has been working in IT since 1996. He has worked as a consultant and administrator for the likes of Innofactor Norway, Amdahl DMR, Fujitsu, Barclays and Hypo Real Estate Bank International where he dealt with large and complex IT infrastructures and MicroWarehouse Ltd. where he worked with Microsoft partners in the small/medium business space.
Live Webinar: Active Directory Security: What Needs Immediate Priority!Live on Tuesday, October 12th at 1 PM ET

Attacks on Active Directory are at an all-time high. Companies that are not taking heed are being punished, both monetarily and with loss of production.

In this webinar, you will learn:

  • How to prioritize vulnerability management
  • What attackers are leveraging to breach organizations
  • Where Active Directory security needs immediate attention
  • Overall strategy to secure your environment and keep it secured

Sponsored by: