Scaling Cloud Networking at DigitalOcean

Summary

In this post from ThousandEyes Connect New York, we’ll discuss the presentation by Luca Salvatore, Network Engineering Manager at DigitalOcean. Luca heads the network team at the cloud Infrastructure-as-a-Service (IaaS) provider based in New York City.

Figure 1: Luca Salvatore presenting at ThousandEyes Connect New York.

During his talk, Luca describes the challenges of managing networks at DigitalOcean and how his team has used ThousandEyes to streamline the process of responding to customer complaints and troubleshooting network problems.

The Evolution of Networking at DigitalOcean

Founded in 2011, DigitalOcean is a virtual private server provider with data centers around the world. Luca describes DigitalOcean’s rapid growth: “In early 2012, we had a few hundred customers running 1000 to 2000 VMs. Today, we have 11 data centers, 12,000 physical servers, and we’re hosting over half a million VMs (Virtual Machines). Next year, we plan on opening another two or three data centers.”

Luca leads a very lean team of five network engineers who manage DigitalOcean’s substantial network infrastructure: “We’re primarily a Juniper shop, so we use MX routers, AX switches and QFX switches. Our latest design is using all QFX switches; we build 40Gb down to the servers—we have 40Gb top of rack, a 40Gb spine aggregation layer, a 40Gb core, and we buy tons of transit from all the Tier 1 providers. We’re also really big on peering, so in every single one of our locations we’re on an IX (Internet exchange) point, sometimes two or three, depending on the location.”

DigitalOcean has also begun implementing software-defined networking (SDN), one of the biggest buzzwords of the last few years. “What that really means for us at DigitalOcean is separating the control plane from the data plane. The data plane is where the action happens—where things are forwarded very quickly. And to scale a data center and half a million Droplets (which is what we call our VMs), you really need a control plane that knows where everything is and is able to talk to and build the tunnels between your hypervisors and data centers. Our next set of data centers will be our first set of fully software-defined networks.”

Major Challenges

One of DigitalOcean’s biggest challenges is scaling. “In our early days, we built gigantic Layer 2 networks. It gets scary—some of the biggest EX9200 switches that we use have a maximum ARP (Address Resolution Protocol) entry count of 256,000 ARPs, and when you run a data center that has 100,000 or 200,000 VMs, you start to see some strange issues pop up on these switches. We’ve really pushed a lot of the Juniper gear to levels it hasn’t been pushed before.”

Another challenge has been deployment. “When you need to build data centers and add racks quickly, you need to have a very automated process. We have a system using Juniper’s Zero Touch Provisioning that allows us to plug a switch into the network, and it pretty much builds itself. We can build out 20 or 30 racks in the time it takes a switch to download its version of JunOS, which is 5 minutes.”

As a cloud provider, Internet stability is also a major concern. “The Internet is an unstable place, and you really need tools that can give you insight into what’s going on. Obviously, ThousandEyes is a great tool for that.”

Latency and packet loss are among the most important performance metrics for DigitalOcean. “What we find is customers are actually pretty aware of latency and packet loss. We get tickets logged all the time from customers who notice a 20 or 30 millisecond increase to their Droplet.”

Lastly, Luca discusses DDoS attacks: “Denial of Service attacks are a constant problem; right now, I’m sure we have a DDoS attack happening—it’s nonstop. Most of the attacks, mainly coming out of China, are from people’s home devices that have been compromised. And it really doesn’t take much for a few thousand compromised machines to spin up a lot of traffic. Places, like CloudFlare, getting 400 to 500Gbps DDoS attacks are actually not that uncommon these days.”

The network team at DigitalOcean certainly has a lot on their plate. In the next section, Luca explains how ThousandEyes has helped with his team’s daunting workload.

Why ThousandEyes?

So what kind of solution was DigitalOcean looking for? “We needed to know what our network looked like from outside our network. Before we used ThousandEyes, we used Pingdom; it was okay for what it did, but it really didn’t give us much visibility into how our network looked from a customer’s perspective. So we did a trial of ThousandEyes and everyone was pretty much on board from the first day.”

Luca also describes how his team uses ThousandEyes: “We started monitoring every single data center that we have. We choose about 30 to 40 Cloud Agents and point them at a Droplet in every data center. Then we start to track latency and packet loss. We also run a couple of Enterprise Agents in three of our locations so we can track bandwidth. The main thing we’re looking at is, how does the customer see us? Latency and packet loss are really good things to know from a customer perspective.”

Interestingly, DigitalOcean also trains their support team with ThousandEyes. “Every new support member that comes on board is trained to use ThousandEyes. This allows us to give our customers a really high level of support. If we get a big flood of tickets because some transit carriers had an outage, the support guys can log into ThousandEyes and see what’s going on; then they don’t have to ask the network engineers. It means that they can respond to the customer with some really good information on the first response. We don’t have to go back and ask, ‘Can you give us a traceroute? Can you give us some pings?’ We can actually say, ‘Yes, we’ve got a problem, we’re troubleshooting it,’ and then update our status page. It makes our support a lot quicker, and it means there are fewer interrupts for the network team.”

The team also uses alerting features on ThousandEyes, which are integrated with an on-call system, PagerDuty. For example, “if a location has 20% packet loss for more than 10 minutes, it’ll trigger an alert to a network engineer who can troubleshoot the issue. Typically it’s really easy to see where our problem is, and it’s easier to fix that problem once you’ve identified it.”

Luca also takes advantage of the reporting function: “We use it for our monthly KPI meetings. I present a lot of graphs out of ThousandEyes—I just take a screenshot and put it onto our internal slide decks, and the executives love it, so it makes my life easier. I don’t have to go digging for data.”

Examples from Real-Life Network Monitoring

Though the issue isn’t in the network for the majority of events, sometimes it unfortunately is. Luca shows data from an outage caused by a DDoS attack in DigitalOcean’s Frankfurt location. The four nodes circled in red are the links connecting DigitalOcean’s edge routers to the rest of the world. The DDoS attack was roughly 100Gbps; all links were maxed out at 100%. DigitalOcean does have some remote blackholing, but it unfortunately didn’t kick in during the outage. “You can see that every location we were monitoring from was reporting a lot of packet loss. It was easy for us to identify that there was a problem with our routers.”

Figure 2: The four nodes circled in red were routers maxed out by a 100Gbps DDoS attack. All monitored locations reported substantial packet loss (red nodes).

Sometimes, the issues are less obvious. Luca shows a screenshot from a Comcast outage in New York City: “You can see there are are a bunch of green Cloud Agents getting through to the endpoint, which is a Droplet hosted in our New York City data center. There were a bunch of things that were working fine, but there were also a lot of Comcast customers who were complaining. Our support team logged on, saw this, and were able to say, ‘It looks like a Comcast problem; we’ll get our network engineers to jump on that.’ We just shut down our Comcast link in New York City and the problem went away.”

Figure 3: DigitalOcean’s support team used this data to diagnose the issue as a Comcast outage (red nodes) in New York City.

And, as a final example, sometimes the data looks impossible to interpret. Luca shows a problem his team encountered last week with one of their peers, Telia, that was performing software upgrades on their core infrastructure. Telia had forgotten to down a BGP session somewhere.

Figure 4: Sometimes the data induces an initial reaction of, “How is this so screwed up?”

But Luca maintains that the answer is still clearly shown in the data: “It’s actually really easy to see if you just drill down: we can see that the route terminates at this router in Telia. It didn’t have a route back to DigitalOcean, so it’s dropping 100% of traffic. We just downed our link and the problem went away.”

Figure 5: But after drilling down into the data, the issue is clear: the route terminates at one particular router in Telia, which is dropping 100% of traffic.

Final Thoughts

Luca summarizes his talk with a few final thoughts, reiterating some of his main points about managing networks at DigitalOcean.

“Cloud computing has been a buzzword in the industry for 5 or 10 years now; it’s definitely here to stay. Companies like DigitalOcean are proof of that. You don’t get to hosting half a million VMs without it being something that is going forward. It’s going to be huge.”
“Automation is definitely necessary, especially when you’ve got two and a half thousand network devices. If I want to roll out a change to every single one of them, we use Juniper’s PyEZ Python library. It takes about 20 minutes to push out a change to a few thousand devices.”
“Customers are sensitive to problems on the Internet, and they’re actually aware of what latency and packet loss are, so you need to have a good answer when they experience issues. Tools like ThousandEyes really help with figuring out exactly what’s happening. A lot of customers send us traceroutes and expect us to have a good answer—without these types of tools, we really can’t.”
“Visibility is key. Being a company that lives on the Internet, we really need to know the state of the Internet in all of our regions. ThousandEyes is great for that.”
“Having your support team trained on all the tools and knowing how to respond to a customer is key.”
Share the tools: “All of our monitoring tools are shared with everyone in our company; there are dashboards up all over our office, including some of the ThousandEyes customized dashboards.”

For more details from Luca’s comprehensive talk, check out the video of the full presentation below.

Published On

Young Xu

Product Marketing Manager

Categories:

Events News

Share This!

Back to ThousandEyes Blog

related blogs

Assuring the First Mile: How Wireless Active Testing Powered Cisco Live Amsterdam 2026

Cisco Live Amsterdam is always a whirlwind of innovation, but the 2026 event at the RAI Amsterdam reached a new level of complexity. With over 20,000 attendees, hundreds of live demos in the World of Solutions, and a massive surge in AI-driven mobile applications, the pressure on the event’s Wi-Fi infrastructure was a constant reality. At Cisco Live, the network is always under the spotlight and simply must perform. This is a standard requirement, with the bar rising year after year as attendees demand a flawless experience on a network designed by the industry’s leading vendor and the best engineers in the field.

By Federico Lovison, Mike Hicks, Hatam Shukur 01 Apr 2026

Cisco Live Endpoint Agent Enterprise Networks Network Monitoring

Many Eyes on Black Hat Europe: How ThousandEyes Delivered Full-venue Visibility

Explore the innovative uses of Cisco ThousandEyes at Black Hat Europe 2025, where real-time data, visual dashboards, and integrations enabled rapid incident resolution and optimized the overall network experience.

By Roberto D'Amato, Susheela Francis 13 Feb 2026

Enterprise Networks Network Monitoring

ThousandEyes and the Black Hat USA 2025 Experience: A Network Perspective

Explore the innovative uses of Cisco ThousandEyes at Black Hat USA 2025, where real-time data, automated alerts, and integrations enabled rapid incident resolution and optimized the overall network experience.

By Mauro Caballero, Daniel Gaona Campos 11 Dec 2025

Alerts Enterprise Agent Integrations Network Monitoring

How ThousandEyes Became a Black Hat NOC Essential

Learn how ThousandEyes boosts NOC visibility and enables seamless connectivity, solving network issues for an exceptional Black Hat attendee experience.

By Shimei Cridlig 20 May 2025

End-user Experience Network Monitoring Security

ThousandEyes Achieves C5 for Cloud in Germany

ThousandEyes adds C5 attestation to its security portfolio, enabling trusted cloud operations for German customers.

By Jillian Murphy 24 Apr 2025

Cloud Public Sector Security

Introducing Broadband ISP Cloud Agents

The delivery of every website and web application relies on a set of dependencies that include Internet providers, DNS and, increasingly, CDNs and other 3rd party services. Failure or degraded performance of any of these services can dramatically impact application delivery. Proactive, continuous monitoring of network and service dependencies is critical to ensuring users of a website or Internet-facing application have a high-quality digital experience.

By Angelique Medina 07 Mar 2018

ThousandEyes Charms the Audience at Networking Field Day 12

On August 12th, 2016 ThousandEyes dazzled the geek squad at Networking Field Day 12 in Santa Clara, California. ThousandEyes was one of the nine companies presenting at the three day event to a curated group of well known network industry experts and thought leaders including Ethan Banks of Packet Pushers, Justin Cohen and Brandon Mangold. Check out the delegates’ reaction to ThousandEyes and NFD12.

By Archana Kesavan 16 Aug 2016

Monitoring the Remote Workforce with Endpoint Agents at AIM Specialty Health

Yesterday we launched a brand new product and our newest innovation, the Endpoint Agent. Extending the periphery of network intelligence all the way to the end user, the Endpoint Agent provides real user perspective on critical applications and the impact of the underlying network on application performance.

By Archana Kesavan 11 Aug 2016

End-to-End Visibility for the Borderless Enterprise

It started with recognizing that there was an Internet-sized hole in visibility that was preventing companies from delivering flawless experiences to users. That was ten years ago, and now ThousandEyes is central to many F500 and G2000 companies' global operations. Just over six months ago, we officially became part of Cisco. I am thrilled to share some key milestones we've hit in realizing our vision of empowering every Cisco customer to see every network—even those they don't own or control—as if they were their own.

By Mohit Lad 30 Mar 2021

Application Experience Enterprise Networks

ThousandEyes Announces Integration with Cisco Catalyst 9000 Access Switching Series

Today, we’re excited to announce that ThousandEyes is joining forces with the Cisco Catalyst 9000 family of switches to bring Internet and Cloud Intelligence to tens of thousands of organizations around the world, enabling them to see and manage networks and applications that they rely on but do not directly own or control. By bundling ThousandEyes with its most widely deployed switches for campus and branch environments, Cisco is solving a critical visibility gap for enterprises, who are increasingly leveraging cloud, Internet and SaaS to deliver digital services to its employees and internal systems.

By Angelique Medina 30 Mar 2021

Cisco Enterprise Networks

News

Scaling Cloud Networking at DigitalOcean

Summary

The Evolution of Networking at DigitalOcean

Major Challenges

Why ThousandEyes?

Examples from Real-Life Network Monitoring

Final Thoughts

Stay Connected

Subscribe to the Internet and Cloud Intelligence Blog!

Stay Connected

Subscribe to the Internet and Cloud Intelligence Blog!

related blogs

Summary

The Evolution of Networking at DigitalOcean

Major Challenges

Why ThousandEyes?

Examples from Real-Life Network Monitoring

Final Thoughts

Stay Connected

Subscribe to the Internet and Cloud Intelligence Blog!

Stay Connected

Subscribe to the Internet and Cloud Intelligence Blog!

related blogs

Upgrade your browser to view our website properly.

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.