In a previous blog, Leveraging Modern Visibility Tools to Tame Your IT Infrastructure, we used lessons from the historic wild west to manage complex government IT networks. In this blog, we’ll move back into the present to discuss ways to ensure the performance of your cloud infrastructure, platforms and services.
A recent GAO report, Cloud Computing: Agencies Have Increased Usage and Realized Benefits, but Cost and Savings Data Need to Be Better Tracked, details several benefits of cloud services, including expanded service delivery and improved information sharing. Using cloud services helps reduce IT costs, as agencies avoid paying for resources needed to build, operate and maintain the services themselves. Cloud services also give agencies the flexibility to add services and capacity when needed.
Despite these advantages, the GAO report revealed that many government agencies lack the tools, policies and processes to manage cloud services. While agency staff excel at deployment, maintenance and help desk tasks, they can struggle to track public cloud performance and isolate issues between service providers such as Google, Amazon, Microsoft and IBM. This results in increased costs, as staff waste precious resources trying to determine the source of each issue.
When it comes to managing cloud performance, it is crucial to determine appropriate cloud SLAs. The issue is that when it comes to enforcing SLAs, most IT teams lack the necessary visibility into cloud provider performance metrics that impact the delivery of their services. So it becomes nearly impossible to hold vendors accountable to meeting expected performance standards.
Digital experience monitoring solutions can help bridge that gap. They create a window into an agency’s entire IT ecosystem—both internal and external. These solutions, when paired with procurement best practices, can help ensure the cost-effective performance of agency IT infrastructure with appropriate service response time measurement for internal and third-party users.
Know Your Service Level Agreements
Defining each cloud service provider's responsibility is the key to managing diverse IT networks. A Service Level Agreement (SLA) is the foundation of each service provider engagement. Review and track your SLAs as a part of communication and maintenance with each of your cloud service providers. Develop a shared understanding of how each service provider impacts your agency network operations and security posture. This makes it much easier to isolate and address issues as they occur.
Ensure your provider is meeting its performance requirements by setting service baselines for each SLA. Baselines simplify review, assessment and performance reporting of each service provider. In its white paper, Best Practices for Effective Cloud Computing Services Procurement within the Federal Government, the GSA recommends using specific, measurable indicators. Metrics like Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs) are good examples.
An excellent starting point is the Cloud Performance Benchmark report, which is a performance-based comparative cloud vendor study that we released in the fall of 2019. It measures and compares network performance (in terms of loss, latency and jitter) between the top five cloud providers, including AWS, GCP, Azure, IBM and Alibaba Cloud. The report shows that not all cloud providers are equal, and that performance really depends on the unique needs and geographic regions a business cares about. So just because you’re “in the cloud” doesn’t mean you’ll have a consistent experience.
Understand the Root Cause of an Issue
Without good visibility, it is next to impossible to isolate performance issues between your agency, cloud service providers and end-users. Agencies that rely on a hybrid-cloud or multi-cloud strategy, in particular, can find it difficult to determine which service is causing an issue due to the complexity of their deployment. When something goes wrong, it’s typical for IT teams to spend a tremendous amount of time troubleshooting in “war rooms,” with lots of finger-pointing, but no definitive data proving where the problem actually lies.
Without good visibility, it is next to impossible to isolate performance issues between your agency, cloud service providers and end-users. To stay ahead of this, create a continuous lifecycle approach to monitoring your entire network. Incorporate regular monitoring as part of your deployment of new applications and cloud components. Periodic review helps your IT teams discover issues before they can impact users. It also helps your IT teams develop a big-picture view of your network and the interaction of each service provider.
Arm your teams with solutions that provide the evidence they need to escalate the issue to the correct service provider. When performance issues occur, there’s no guessing who is at fault, and your staff can quickly isolate and address the source of the problem.
Identify Areas for Improvement
It’s essential to identify your agency's desired cloud strategy outcomes to ensure you stay on track. Document your goals and objectives, along with your fundamental reasons for moving to the cloud. Once you know where you want to go, it will make it much easier to identify your current state and to develop a plan to continue meeting your service goals.
Your IT teams can identify areas for improvement once they understand your current network functionality. Using solutions that provide a perspective from every user (whether a customer or agency staff member), your team can understand cloud performance impacts on digital experience delivery. Combined with application- and network-layer visibility, your team will have a full-picture overview of your agency’s entire enterprise network. Agency IT teams can use that insight to work with your service providers, optimizing service delivery from all perspectives.
ThousandEyes solutions provide the tools each agency needs to manage cloud components that they do not own or control. By identifying the root cause of your network problems and enabling you to share your findings, you can respond to issues before they can impact your agency.