Today we’ll focus on DNS alerting, continuing on the alerting theme from our previous post on Proactive BGP Alerting. To get us on the same page, let’s quickly review the types of DNS data you can monitor. There are two primary ways to collect DNS data in ThousandEyes:
- DNS tests use Cloud Agents (located in major IXPs and POPs) and Enterprise Agents (located in your network). Tests include DNS Server, DNS Trace and DNSSEC.
- DNS+ tests use thousands of caching servers across dozens of countries and hundreds of networks to provide network-by-network visibility. Tests include DNS+ Domain and DNS+ Server Latency.
With these tests you can measure DNS server performance (authoritative or caching), trace DNS queries through the DNS hierarchy, validate DNSSEC signatures and compare the accuracy of responses versus expected mappings.
There are four major types of DNS Alert Conditions that you can use to create Alert Rules:
- Error: true when there is an error returned in the query
- Resolution Time: time to resolve the query in milliseconds
- Mapping: compares the record to a list of IP addresses or domain names
- Availability: Specific to DNS+ Domain tests, % of vantage points able to reach the name server
DNS Alerts for Authoritative Server Performance
One of the most common use cases is monitoring the performance of an authoritative DNS server. For example, you want to understand performance at your DNS host UltraDNS during an outage. The authoritative server may be part of your infrastructure, an external DNS registrar or a third-party’s DNS server that you rely on (e.g. SaaS provider). You can monitor and alert on both the availability of the server and accuracy of the records. For DNS Server tests, you’ll monitor from Cloud Agents around the world or Enterprise Agents located within your network.
Availability can be measured by an error state (present or not) and resolution time. For these you would setup DNS Server > Error and DNS Server > Resolution Time alert rules. DNS Server tests can also be coupled with Network and BGP tests, which enable you to additionally monitor and alert on packet loss, latency and routing changes.
Accuracy can be measured by evaluating the query response to an expected set of mappings. For this you’d set up a DNS Server > Mapping alert rule.
DNS Alerts for Caching Server Performance
In addition to monitoring authoritative servers, you may also want to monitor caching servers in your own network for key internal or external records. To monitor caching servers you would setup a DNS Server test, but change the configuration to be recursive so that Enterprise Agents in your network would use your corporate or ISP DNS cache, rather than querying from the authoritative server directly. Note, you’ll need to target your local caching server, rather than using the auto-lookup feature in the test configuration.
Once you’ve set up the test, caching servers can use the same Alert Rules to authoritative servers. DNS Server > Error and DNS Server > Resolution Time Alert Rules monitor availability.
DNS Alerts for Cache Poisoning
Cache poisoning occurs when an attacker inserts a forged record into a DNS cache, typically resulting in the NS record pointing future DNS queries to the attacker’s name server. In addition to intentional attacks, you may want to monitor key records in your DNS cache that you change frequently.
DNS Server > Mapping will help you understand the record accuracy of caching servers in your network or in your ISP from Enterprise Agents within your network. Update the alert conditions to match your expected responses. You’ll want an Alert Rule for each record that you test. This will help you troubleshooting DNS resolution issues that may be caused by a stale or poisoned cache.
You can also monitor caches in remote networks using a DNS+ Domain > Mapping Alert Rule which will give you insight into the caches of thousands of DNS servers around the Internet. See the DNS Hijacking topic below on how to configure this Alert Rule.
DNS Alerts for DNS Hijacking
In November, I wrote about a Craigslist DNS Hijack that resulted in interruptions or service and required caches to be flushed across the Internet. DNS hijacking can happen by an attacker breaking into your own DNS infrastructure or can result from records that you hold in external providers, as occurred in the spear-phishing attacking of Craigslist’s DNS registrar account. In either case, you want to be able both detect the hijack (when the records are forged) and understand which caches must be flushed (in order to contact key ISPs).
To detect the hijack, use a DNS Trace > Mapping alert to 1) trigger if your NS records are compromised and pointing to an incorrect name server and 2) if key A or AAAA records are compromised and pointing to the wrong IP addresses (or domains in the case of CNAME records).
In order to get an indication of the proportion of caches that need to be flushed, use DNS+ Domain > Mapping alert which will collect data from thousands of caching name servers around the world and trigger when a configurable proportion of these return an incorrect record. You can limit the alert to specific countries or require a % threshold of caching servers.
Getting Your Tests and Alerts Running
DNS monitoring and alerting is crucial to maintaining high levels of availability expected for major web services and applications. We covered four DNS monitoring use cases today:
- Authoritative server performance
- Caching server performance
- DNS cache poisoning
- DNS hijacks
Interested in learning more about network alerting? Watch the recorded webinars on Best Practices for Monitoring DNS and ThousandEyes Alerting Essentials.