Microsoft Details its Latest Efforts to Reduce Teams Outages

Microsoft Teams

Microsoft has been doing some significant work to make its Teams app more reliable and less susceptible to being hit by outages. These investments complement ongoing efforts to make Microsoft Teams more battery-efficient when users are in video meetings.

First of all, Microsoft wants to improve the resiliency of the app and make it as fault-tolerant as possible during normal operations. On the technical side, Microsoft says that worked on a built-in automatic detection and mitigation system to improve the app’s reliability.

Microsoft engineers have implemented granular fault isolation measures to reduce the impact of an outage. It has also designed safe change management strategies to minimize the potential risk associated with continuous changes on users.

In addition to these cloud principles, Microsoft Teams moved to an active-active architecture. This means that the app now uses a traffic manager to route traffic to the most appropriate path if a failure occurs. Microsoft has also been working on the identification and removal of all single points of failure that could cause Teams to experience an outage.

Microsoft Teams reliability principles

Microsoft also explained that it started using different deployment rings to ensure that any potential fault that occurs as a result of a change will affect the minimum number of users. “The basic idea is that when we deploy a change, configuration, or code, we gradually deploy and validate our changes with a small set of users and then expand to a higher ring once metrics meet their targets, feedback has been gathered, and gates have been passed.” the company explained.

The automated deployment pipeline principle, which is based on the blast radius reduction philosophy, helps to ensure that all new features and improvements are validated via A/B testing before making them generally available for everyone.

Microsoft Teams ensures fault tolerance through replication and redundancy

Lastly, Microsoft highlighted that it ensures fault tolerance by replicating data across its global datacenters. As part of Microsoft’s Security Development Lifecycle, the company has also been leveraging third-party testers and auditors to validate compliance requirements and find bugs via the bug bounty program.

The Redmond giant believes that Microsoft Teams is one of the key tools to enable the future of hybrid work, as it helps employees to work efficiently and productively. Microsoft noted it’s committed to improving the calling and meeting experiences and encourages users to continue providing feedback on the Microsoft Teams feedback portal.