While working with customers using Microsoft’s Azure Stack HCI solution, I often hear concerns about the security, identity management, and Internet access aspects of this platform. In this article, I’m going to detail a methodology allowing IT pros to make their Azure Stack HCI environment more secure by creating a fabric domain and network.
More often than not, customers may want to set up security and accessibility rules for their servers that won’t work with hybrid services and systems such as Azure Stack HCI. In most cases, those regulations may include Internet access using Proxy servers, delegated control in Active Directory, or IP-based firewall rule sets. The purpose of the methodology I’m going to describe in this article is to keep Azure Stack HCI separated from the rest of the application services hosted by an organization.
Microsoft’s hyperconverged infrastructure (HCI) cluster solution can be pretty complex (and costly) to implement, but a fabric domain and network can increase security between your fabric domain and user applications services.
A fabric domain and network is an encapsulated part of your network with a separate Windows Server Active Directory or Lightweight Directory Access Protocol (LDAP) domain that’s only used for virtualization and hosting purposes. Such a network normally contains one or more virtual local area networks (VLAN) and a separate domain for both fabric machines, as shown below.
This type of architecture is compatible with a virtualized environment where your workload, which may include application servers, has no need to be in the same network or domain as your underlying and hosting infrastructure.
As an IT pro, the only thing you normally need to do is to correctly set up VLAN tagging on your physical switches to connect your workloads to the correct application networks and domains.
With a proper fabric domain and network in place, you can now start to set up your firewalls and domain to suit the needs of your fabric. If you’re using Azure Stack HCI, this means that you can fulfill the specific firewall and connectivity requirements to connect your network to Azure Arc, or use constrained delegation for the hosts, cluster, and Windows Admin Center.
Moreover, you can also start to harden your Azure Stack HCI environment without the fear of interrupting other workloads. If you use applications such as Microsoft’s Azure Kubernetes service, you may also be able to run the master node and worker nodes in the fabric and hide them behind a load balancer or other gateways to improve security.
To build a fabric domain and network for Azure Stack HCI, you’ll first need to have the following components ready.
Depending on your requirements, you may also need an additional jump host or routing connection between your applications and fabric network, but we will be discussing that later.
From a cabling perspective, the most confusing part for network administrators or experts not used to software-defined networking is that your compute workloads and your host fabric can share the same physical server ports. The separation only happens by using VLAN tagging.
You can use the same switches to cable your fabric network and your compute network, as shown below. You’ll also be able to add additional components such as domain controllers, gateways, and Windows Admin Center to your environment.
The resulting architecture should look like the schema below:
The impact of this architecture on your hardware expenses should be rather small. It’s also a good security investment as you can increase the reliability and security of your fabric and application environment.
For Azure Stack HCI management, you’ll need a management IP network per location, especially if you are using stretched clustering. The management network also must have access to Azure to register Azure Stack HCI and Azure Arc components.
The architecture and IP schema for a single cluster connecting the fabric network could look like the one below:
I only explained the management integration here. If needed, the Microsoft Learn website has more information about additional network requirements for Azure Stack HCI storage, stretched clustering, and software-defined networking, please follow the links below.
For the creation of your Active Directory structure, you are mostly free to choose what is most convenient for you. The only thing I would recommend is to create a separate structure for your servers and users, as well as the creation of separate admin accounts for your fabric admins.
Normally, I would recommend a structure like the one below where domain controllers always stay in their default organizational unit in Windows Server Active Directory. There are other branches for your different locations, user account organization units, clusters, and other services.
There are two access scenarios for a fabric environment. The most common one is that you allow routing between your regular environment and the fabric network. In that case, you may use the same core firewall and router, or have a router in place which can bridge the gap between both.
This type of environment makes access easy, but if you don’t use additional firewall and routing policies between those networks, you will have reduced security. If your user network is compromised and an attacker manages to connect to your fabric, you’ll still have better security with a multi-tenant deployment.
With an air-gapped deployment, the fabric network has zero access to the actual user and application network. Normally, you normally would either use jump hosts to connect to the environment or have admin systems able to fulfill management tasks.
If an air-gapped environment is much more secure, it does complicate things for administrative staff. They will need to connect to the jump systems to connect to the cluster and fulfill their management tasks. For monitoring purposes, you may also need to build additional interfaces or gateways to send monitoring data from the fabric to monitoring systems and services.
From a cost perspective, you should normally not expect any additional costs, except if you want to go fully encapsulated. Hardware that could drive additional costs include additional physical firewalls, but you could also use virtual network appliances instead.
Moreover, you may require additional domain controllers that must be hosted outside of the Azure Stack HCI cluster. That could lead to additional hardware costs, or you could also virtualize them on another cluster.
You may also face other costs depending on your needs. I personally prefer to run everything virtualized, especially if you’re using more than one cluster. Another option is to run the central Windows Admin Center and domain controller on Azure, but that requires a constant VPN connection to your cluster locations.
As you can see, a fabric domain and network for your Azure Stack HCI environment make sense in many different cases. It adds more complexity when building it up in the first place, but in the long run, it will increase security between your actual fabric domain and customer and user applications services.
In modern scenarios, your fabric domain often doesn’t need to be connected to the actual workload it’s hosting, so you can keep it separated. That’s a best practice you may not find in official Microsoft documentation, but due to my work with midsized to large customers, I really enjoy how that simplifies an Azure Stack HCI architecture. It also helps to reduce change and approval processes.
When building a fabric domain, especially when using virtualized application servers for Windows Server Active Directory and Windows Admin Center, please ensure to read the required documentation on Microsoft Learn. After creating a cluster, you can also use Windows Admin Center from Azure for your day-to-day operations, but you can also virtualize your domain controllers on Azure as well.