Lights Out for Smart Lights
On June 1st, 2018, Philips debuted a new service for its Hue smart-home lighting product, but things quickly went dark when the cloud-based API endpoint with which remote apps and services communicate went down for four hours. As The Register put it in its snarkily titled article, Smart bulbs turn dumb: Lights out for Philips as Hue API goes dark:
“On the same day that the company launched its new service—where its lights will respond automatically to streaming music and games—the system died for anyone trying to activate the hardware while outside their house, or using voice control. In-home control was unaffected.”
The Philips lighting API availability blackout isn’t the only recent IoT outage. Just two weeks before, on May 17th, The Register reported a similar 3-hour outage involving Nest smart-home devices, in which users across the U.S, Canada and Netherlands couldn’t use their mobile apps to open their doors, view their home monitoring feeds or program their thermostats.
These two outages can tell us a few essential things about customer-facing APIs.
APIs are Key to Digital Experience
Digital experience is mostly associated with web applications. Web service APIs have historically treated as a matter for developers to think about, the stuff that distributed modern applications are made of. However, today, REST APIs and JSON APIs are on the front line of digital experience. API frameworks are the essence of much of modern end user experience. IoT is a great example, but not the only one. In the UK and the European Union, regulations mandate the availability of external-facing financial services APIs for individuals and Fintech companies to utilize. Check out our blog post on monitoring open banking APIs for more info on that front.
The API Matrix is Here
Modern apps and IoT ecosystems are very complicated beasts that require many APIs to “fire” with very low latency. How many API calls does it take to turn on a light bulb? The answer can be found below in a picture of the Philips Hue architecture from a Google Next conference presentation. By the way, love the developer code names—”Client Eastwood” is a favorite for sure.
No API is an Island
For each API call to work correctly, it needs many different elements along the entire service delivery path to work well, including the client app or service making the GET or POST requests, DNS, underlying servers and IaaS instances, authentication services and their infrastructure, Internet access and transit providers. Oh and don’t forget about BGP routing which makes all of this dependent on an implied trust chain of global prefix announcements. When you consider the dependencies of each API call and then consider the number of internal API and external API calls needed to complete single interaction, the complexity gets pretty mind-boggling. Most of the time it should work fine, but what about when it doesn’t? Remember that most of the underlying infrastructures and networks are external to the API service itself. That’s what can make troubleshooting outages and performance issues such a time-consuming and challenging job. A single connectivity outage affecting a single cloud provider region can cause havoc, as seen in the May 31st AWS US-East-2 outage that cut off communications from that region to external CDN, hosting, CRM and payment gateway providers. See our blog post on that outage to learn more.
APIs Need Holistic Monitoring
Monitoring API performance isn't new, but addressing the underlying and external dependencies is more important than ever. Yes you need to need to monitor api availability, but it's not just enough to do uptime monitoring and traditional server monitoring within your cloud instances and data centers. Performance measurement needs to be holistic across multiple layers.
As mentioned above, you don't own many of those layers anymore. But just because you don’t own and control all the external dependencies that your API endpoints rely on doesn’t mean you shouldn’t monitor them. That monitoring should be holistic. Monitoring should address multiple metrics like API endpoint availability and response time. It should address the app/service, network path, DNS, and Internet layers. It should also be performed from the vantage points of every end user, whether that user is a person, or a server sitting in one of multiple cloud provider regions. You want to cover every thread of inter-service communication that your app and ecosystem relies on to function properly.
The problem has been that traditional ways of monitoring dependencies like DNS and networking have either been non-starters or so crude as to be nearly useless. For example, you can’t use passive data collection techniques to monitor external networks since you don’t control the devices in other organizations’ networks. You can’t perform APM code injection into third-party API services (like authentication). Moreover, simple monitoring tools like ping and traceroute provide inadequate data for powering a modern monitoring solution.
Get Network Intelligence
Fortunately, there is a way to get smarter API monitoring, with a combination of real-time active and passive performance monitoring that gives a correlated view from the app/service, to detailed end-to-end path visualizations, to DNS trace and global BGP routing monitoring.
If your Digital Ops, Tech Ops or DevOps team needs to gain that kind of smart, holistic Network Intelligence, you can learn more in our Guide to Network Intelligence. For a detailed walkthrough of ThousandEyes, request a demo. If you’re ready to get your hands on holistic visibility, start your free trial now.