Network Troubleshooting

Network Troubleshooting Public Cloud Applications

What is Network Troubleshooting?

Network troubleshooting is the collective set of measures taken and processes used to identify, diagnose and resolve problems and issues across computer networks. It is a systematic process done primarily by network engineers or administrators to repair a network infrastructure, restore or optimize a network service. It is generally needed to recover and establish network or Internet connections and resolve application issues spanning key network elements all the way to the end user.

Some of the processes within network troubleshooting include but are not limited to:

  • Identifying the exact issue or problem
  • Recreating the problem if possible
  • Localizing and isolating the cause
  • Formulating a network troubleshooting plan for solving the problem
  • Implementing the network troubleshooting plan
  • Testing to verify that the problem has been resolved

Network troubleshooting can be a manual or automated task. When using automated tools, network monitoring can be done in combination with network diagnostic software. Tracking down the cause of problems on the network is both an art form and a science requiring a combination of institutional knowledge of a particular organization's network infrastructure and use patterns, conceptual education, technical skills and competence in using hardware and software utilities and tools. After many years of experience, network engineers develop honed troubleshooting skills, awareness of common pitfalls and resolution techniques, and efficiency in the use of a combination of analysis tools to help get to and solve the root cause of network and application issues.

Network Troubleshooting for VoIP

One of the most challenging troubleshooting use cases is finding the root cause of packet loss and latency in real-time applications such as VoIP communications. VoIP traffic is highly susceptible to congestion, loss, latency and jitter across a network, and it is a frequent target of user complaints and a big troubleshooting deep dive that consumes staff, especially as VoIP services move towards cloud-based Unified Communications As A Service (UCaaS ), where control plane signalling is handled by a cloud service, while data plane traffic flows directly from peer to peer device.

Network Troubleshooting for Internet and Cloud

As many enterprise applications move into the public cloud, monitoring metrics required to conduct performance analyses and root cause determination are often missing at some network layers. This is primarily due to access restrictions to third party Internet networks and service infrastructures, preventing traditional monitoring instrumentation that assumes ownership or administrative control. As a result, the end-to-end and hop-by-hop network performance of the application delivery becomes a 'black box', severely limiting efforts to troubleshoot problems. These problems are endemic with SaaS and PaaS platform deployments. Corporate IT network engineers generally have no real visibility into the behavior and performance of transit networks, security services, and SaaS or PaaS networks, infrastructure and software. The impact of public cloud hosting, and particularly hybrid on-premises/public cloud has created many hurdles when trying to troubleshoot network performance.

Multiple vendors have delivered troubleshooting tools that chart the path of an event across the data center. Some have also delivered a level of visibility into transport performance external to the data center. Once the traffic hits the Internet, however, few existing products can provide detailed network path visibility, including network, routing and application layer metrics.

Ideal Network Troubleshooting Solution Characteristics

In order to help address these challenges, the ideal network troubleshooting solution needs to support these capabilities:

  • Because of the greater usage of the cloud by applications it is very important to monitor Internet paths via a presence "in the Internet". This involves testing from dedicated points-of-presence, including in public clouds.
  • Since cloud-based traffic depends on transport by multiple transit providers, it is critical to identify the network operator for a given hop of the external application path. This enables troubleshooting to extend beyond the customer's private network, whether the problem is with Level 3 or Cable and Wireless for example, which facilitates fast problem resolution.
  • Synthetic application tests such as HTTP, page load, transaction, SIP and RTP tests should be integrated with Internet-aware network troubleshooting tools to speed problem resolution.
  • The network analysis solution needs to correlate different service delivery layers from the app to network paths, to routing, so that root cause analysis can be effective across multiple network visibility data sets.
  • Analysis software needs to create a detailed, "hop by hop" view of the traffic path(s) between the client and server. These paths can be very complex and fast-changing once the transaction exits the data center and crosses the Internet.
  • Interactive sharing of detailed network monitoring data so that end customers can record and share live data with partners or service providers for troubleshooting purposes.