On January 13th, Powerball announced the winners of the largest lottery jackpot in U.S. history. Nearly $1.5B was up for grabs and with more than 635 million tickets sold, punters scrambled to check the winning numbers as they were announced at 8pm PT. There were three big winners. The Powerball website, however, turned out to be the loser for the night.
We’ll dig into the infrastructure behind the fraying Powerball website and how it performed over the night. You can follow along with all of this data using this share link: https://fuzee.share.thousandeyes.com
Mega Meltdown
The Powerball website that serves up jackpot estimates and winning numbers usually sees a page load time of 650ms. This is pretty fast, helped along by the sparse content on the page, as you’ll see in Figure 1.
However, as the lottery drawing approached, page load times spiked to 5 seconds at 8pm and reached over 10 seconds within minutes after the drawing. Figure 2 shows the page load time for the Powerball website, with a dramatic increase around the 8pm drawing.
You’ll also see that a vast majority of users were not even able to fully load the Powerball website. Figure 3 shows web server availability, which cratered in all but 2 of the 28 cities that we tested.
Looking Behind the Curtain
So how did Powerball, which knew it had a record-breaking lottery on its hands, end up with such dismal performance? Let’s dig into the infrastructure and network behind Powerball. The website is hosted in a data center in Kansas City, run by the Multi-State Lottery Association (MUSL). Figure 4 shows the typical network paths into the data center via the ISP Cogent Communications.
At 7:05pm, the MUSL turned on routes to Microsoft Azure, directing traffic from approximately half of the cities observed to Microsoft’s cloud data centers. Figure 7 shows traffic from 5 cities flowing through Microsoft’s network (green nodes).
But as the 8pm drawing approached, the network was under strain from all of the traffic. Packet loss increased, as can be seen in Figure 6, reaching over 90% at the time the winning numbers were released.
At 8:05pm, MUSL again spread the love to another provider, this time Verizon’s Edgecast CDN. Figure 7 shows the network path topology just after the winning numbers were announced. Paths taken to Microsoft’s data center make up the cluster on top (168.61.218.73), Edgecast in the middle (72.21.91.39) and the MUSL data center on the bottom (104.219.253.10).
At 10:10pm, after the traffic finally died down and application and network metrics were back to normal levels, MUSL reverted back to routing traffic to their own data center through upstream ISP Cogent.
Scaling Lessons Learned
So what could MUSL have done better? As the winning numbers were announced, the Powerball website simply wasn’t equipped to handle the massive amounts of traffic it received.
For sites that have spiky, but predictable traffic, here are a few options:
- Use a CDN to serve up traffic round-the clock. This costs more but will have the best customer experience.
- Flip on a CDN service well before known traffic peaks. MUSL did this with Edgecast, but not until the drawing itself, at which point DNS changes can take a while to propagate.
- Diversify with multiple data centers and upstream ISPs. MUSL had only one data center and one upstream ISP, Cogent Communications—if Cogent or their single data center goes down, MUSL’s service goes with it.
- Within the data center, more load balanced network paths and web servers would also help to reduce performance impacts.
The odds of this Powerball drawing were 1 in 292 million. Winning the lottery may be a shot in the dark, but when it comes to web performance, you can have a guaranteed return if you properly prepare for your network’s next big event.