Making the Exchange – Azure Active Directory Connection More Reliable
Basic Software Engineering to Make a Service Run Better
Were you puzzled by Microsoft’s September 9 blog post titled Exchange Online Improvements to Accelerate Replication of Changes to Azure Active Directory? If so, join the party because you weren’t the only one. Unlike most posts from Microsoft, this epistle didn’t cover a glitzy new feature, dazzling new functionality, or even predict that Exchange Online would do something terribly interesting soon. Instead, it talked about plain old software engineering of a type that usually happens well under the radar.
The change was also announced in Office 36 notification MC190021 and is associated with Office 365 Roadmap item 55023.
The Nature of Office 365
Office 365 is composed of a loosely-connected set of services. It might seem that Azure Active Directory is the master, but that’s only true for account and group management. Under the surface, Office 365 runs a seething mass of directory synchronization operations to keep Azure Active Directory aligned with the app directories (like those used by Teams, Yammer, SharePoint Online, and Exchange Online) and on-premises Active Directory (for hybrid deployments). The app directories hold information needed for the smooth running of an application. For example, Exchange Online holds details needed to route email to users and groups. Without precise, fast, and reliable synchronization, the loosely-coupled nature of Office 365 would dissolve at the seams.
The Slow Route Between Exchange Online and Azure Active Directory
What the post tells us is that Microsoft realized that the synchronization between Azure Active Directory and Exchange Online was gradually slowing down. They don’t say why, but it’s possible that the slowness is a function of the growing size of Office 365, the number of objects, the distribution of Office 365 across multiple datacenter regions, the advent of multi-geo tenants, or the way that Exchange Online uses forests to organize its work.
Say Goodbye to Traditional PC Lifecycle Management
Traditional IT tools, including Microsoft SCCM, Ghost Solution Suite, and KACE, often require considerable custom configurations by T3 technicians (an expensive and often elusive IT resource) to enable management of a hybrid onsite + remote workforce. In many cases, even with the best resources, organizations are finding that these on-premise tools simply cannot support remote endpoints consistently and reliably due to infrastructure limitations.
For whatever reason, synchronization between Exchange Online and Azure Active Directory wasn’t working as smoothly or as quickly as it once did, with the obvious sign of the problem being that changes made in the Exchange Online directory were slower to show up in Azure Active Directory. The nature of federated synchronization is that some delays are always potentially likely. My tolerance for any problems is high and I wasn’t worried about the way things work, but clearly enough friction existed for Microsoft to redesign the way Exchange Online and Azure Active Directory handled updates.
It’s important to know that not every directory update executed within Exchange Online is synchronized with other Office 365 apps. Many updates concern mail attributes, like proxy email addresses, and stay within Exchange Online. But when synchronization is needed, Azure Active Directory needs to be updated fast.
For instance, let’s assume that you run the PowerShell Add-UnifiedGroupLinks cmdlet to update the membership of an Office 365 group. The change is made in Exchange Online and is then replicated to Azure Active Directory, which then kicks off a synchronization process to replicate the update to Teams, Yammer, and so on. Clearly if the change is delayed getting to Azure Active Directory, it will be delayed reaching the other apps either.
Dual Writes Solve the Problem
The change removes the need for synchronization between Exchange Online and Azure Active Directory by executing dual writes. In other words, updates happen simultaneously to both Exchange Online and Azure Active Directory, meaning that if the update completes, Azure Active Directory has the changed information immediately.
You might wonder why Microsoft is only removing the need for synchronization between Exchange Online and Azure Active Directory and not dealing with the other directories. Well, apart from the complexity involved in rippling updates across a series of directories at one time, the fact is that the relationship between Exchange Online and Azure Active Directory is the most important directory link in Office 365. The two directories share updates all the time because Exchange Online is the heaviest-used workload and almost every Office 365 account uses Exchange Online. SharePoint Online is the second-highest workload, but its needs for directory updates are far less than exist for Exchange.
The Write Downside
All of this is goodness and you can’t argue with the benefit of updating the master directory as quickly as possible, but it comes with a downside that needs to be factored into PowerShell scripts or EWS or Graph-based programs. For a dual write to work, both directories must be available. If you’re running a script, you’re probably connected to Exchange Online and the write to that directory will be successful. But you could hit a transient condition where Azure Active Directory is temporarily unavailable (aka, a “network glitch”). A call will fail if it can update Exchange but can’t reach Azure Active Directory. This didn’t happen previously because Exchange would accept the update and then synchronize with Azure Active Directory.
Microsoft says that the change in behavior should be transparent and that there’s no need to change scripts. Well, that is unless your code hits a UnableToWriteToAadException condition and fails without you noticing that a problem occurred. Cautious programmers will examine their error handling code to ensure that the code can handle the new condition that now exists. If a problem happens, the right thing to do is to retry the transaction.
Microsoft says that they will roll the new approach out across Office 365 from October 2019 and it should be available worldwide by February 2020 (except in the GCC cloud).
Office 365 is obviously doing something good to be able to grow to circa 200 million active users. What’s nice about this post is that it’s a reminder that software engineering needs to continually evolve to take account of new stresses and strains. Sometimes I think we lose that sight because we’re blinded by the blizzard of new features released across Office 365. Hurrah for belt-and-braces updates! Long may they continue.