New Podcast
Managing Traffic During Peak Demand; Plus, Microsoft, Akamai Outages

Industry

Optimizing Microsoft Teams' Performance and Availability

By Marc Kokje
| | 10 min read

Summary

The home office has seen significant growth due to COVID-19 and the widespread social distancing and shelter-in-place orders. We are now working more, or almost exclusively, remotely, and it is questionable whether this trend will change again after the crisis.


Note: Microsoft Teams has been using the "worldaz.tr.teams.microsoft.com" domain since 2021, and the "world.tr.teams.microsoft.com" domain may be deprecated at some point in the future.


The home office has seen significant growth due to COVID-19 and the widespread social distancing and shelter-in-place orders. We are now working more, or almost exclusively, remotely, and it is questionable whether this trend will change again after the crisis.

Many companies have recognized that teleworking has become an integral part of the “new” world of work, which also brings with it new challenges and questions, such as:

  • How efficiently can users use the provided applications?
  • What is the application availability for remote users in private home office environments?
  • To what extent does the IT department have to adapt to help users in home offices with problems?
  • What about future projects in terms of plan, build and run, and what needs to be considered now?
  • How will new applications or services behave in the current infrastructure, and do the architectures need to be adapted?
  • How do project rollouts work, and what problems might arise during migrations?
  • Does the operations team have all the information, SOPs and visibility needed to ensure the operation of new applications or services?

Digital Experience for Your Employees Matters

Ensuring a seamless digital experience for users today means delivering access to business-critical SaaS applications and identifying problems that arise quickly. As remote working becomes the new normal, chat and video conferencing solutions like Microsoft Teams are becoming essential to maintaining worker productivity and collaboration. In fact, figures published by Microsoft show “a 775% increase in Teams calling and meeting monthly users in a one-month period in Italy, where social distancing or shelter-in-place orders have been enforced.” While the adoption of these services is meant to make workers’ lives easier, IT teams are left wondering how they can ensure the availability, and assess performance, of Microsoft Teams for their employees, no matter where they are located, and what they can do to troubleshoot and resolve any issues that arise.

Using ThousandEyes Endpoint Agent, there are actually several options for IT teams. This agent is part of ThousandEyes’ digital experience monitoring, and it provides IT teams with insight into end users’ underlying infrastructure as well as a complete end-to-end view of the network path, including the ISP and SaaS infrastructures involved, which is essential for targeted troubleshooting. The Endpoint Agent is, therefore, the “window” to the respective user to ensure the best possible user experience and to provide targeted support in the event of an error.

Endpoint Agent User Experience Measurements

Figure 1: Access to a website and the information determined by the Endpoint Agent

Path Visualiztion Microsoft

Figure 2: Path Visualization—The path of the clients to Microsoft including all responsibilities (ISP/CSP)

This information can, of course, also be used for future project planning, while teams are increasingly dealing with the respective phases of plan, build and run. For instance, we’ve used this approach for an international customer to test, in advance, whether the availability and user experience of a planned service met the requirements for regional users. The rollout and operation of the solution could also be optimized because it became clear which service provider was providing the best availability and which requirements had to be met. This accelerated the 21-day rollout and significantly reduced the number of support tickets.

Performance and Availability of Microsoft Teams

Microsoft Teams is a very popular collaboration and productivity service used by many of our customers, so we wanted to share how you can monitor and optimize the performance of this service. When looking at Microsoft Teams, specifically, we wanted to focus on two components that are crucial:

  • The Teams Edge Node, which manages the Teams client and the call signaling and can be reached via teams.microsoft.com
  • The Teams Transport Relay, whose function covers the handling of the entire media part, including which transport relay will be used, and is available via world.tr.teams.microsoft.com.

Microsoft has identified a number of metric thresholds that they recommend to ensure the optimal experience for the end user, and by using these as a "baseline" reference will allow you to validate each component's performance.

Network Performance Requirements Microsoft

Figure 3: Network Performance Requirements provided by Microsoft

In addition, it should be considered that call routing has also changed because users are now accessing Teams from their home offices, rather than from within the office locations.

Microsoft Teams Traffic Routing Cloud

Figure 4: Teams Traffic Routing via Cloud

For this example, we’ve configured these tests in customer environments that focus on these two components.

Teams Edge Node – teams.microsoft.com

Using an automated HTTP server test, we are able to validate the reachability and performance of the Teams Edge node. Values, ​​such as availability, response time and throughput, provide a direct overview of the availability, and functionality of the service. Fairly quickly, you have the option to recognize anomalies and focus on user systems with problems. The installed Endpoint Agent behaves exactly as if the user was accessing teams.microsoft.com.

North American DNS Error Teams

Figure 5: DNS error for two users in North America trying to connect to teams.microsoft.com

As with all ThousandEyes tests, you not only have the HTTP server layer, but thanks to our correlated visibility, you can also access other layers, such as the network. There, the individual metrics for loss, latency, jitter, TCP connection failures and system information like CPU and memory usage which are displayed in detail. With our path visualization, you also get a complete end-to-end overview and can quickly and easily recognize which part of the chain has a problem and who is responsible for it. This could be the home router/Wi-Fi or service provider etc.

Japan User Packet Loss Teams

Figure 6: Users in Japan with recurring packet loss when accessing teams.microsoft.com

Teams Transport Relay – world.tr.teams.microsoft.com

In this example, we use a network test to determine the availability of the transport relay and its service quality. An automated test according to the plan is carried out by the respective Endpoint Agents on the users' systems and thus provides the important values ​​in detail including the end-to-end path visualization.

Transport Relay Network Test

Figure 7: Transport relay network test with the respective report options

So—what has changed since lockdowns began, and how did it affect Microsoft teams?

In the following report, you can see that the packet loss during office hours increased up to 17%—a behavior that we did not see prior to COVID-19 and the lockdown.

Packet Loss During Normal Hours

Figure 8: Packet loss from different regions, especially during normal office hours

Using the path visualization alongside our correlated visibility, we can quickly identify where this packet loss is occurring.

Packet Loss ISP Network

Figure 9: Packet loss within the ISP network

Conclusion

Digital user experience is not only extremely important in today's world, but it is also becoming increasingly important in the long term. Not only will COVID-19 and its subsequent lockdowns continue to have lasting effects on enterprises and their employees, but ongoing trends toward digital transformation and the use of cloud-based services like SaaS, SASE, PaaS, IaaS, etc. will ensure that digital experience remains essential.

IT teams can provide better service with ThousandEyes because they have the information at hand to help remote users. This information can also be made available to service providers, using the share feature, so that issues can be resolved in a more collaborative manner. This ensures much faster MTR times and "ping-pong tickets" can be avoided.

ThousandEyes is the strategic network and cloud intelligence platform for all three phases of plan, build and run and already supports many customers on their way to digital transformation. Reach out today to book a personalized demo and discuss your digital experience plans.

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail