Getting Started: Alerts
Welcome back to our Getting Started series with the ThousandEyes platform. Today, we're going to focus on Alerts. The ThousandEyes platform offers extensive customization options for configuring alert rules and assigning them to tests. By default, each test already has preconfigured alert rules that are enabled based on the test layers involved. Additionally, you can create customized alerts using various alert metrics based on the layer and type of testing, allowing for granular notifications.
The default notification method for alerts is email. Within an alert, you can include text to be sent along with the notification. For example, if you set up an alert for application performance, you can add specific action items for engineers to execute within an alert email. This text feature enables customers to establish their own incident management process.
Furthermore, there are integration options available for third-party platforms, such as PagerDuty, Slack, or ServiceNow. For instance, if your company uses ServiceNow for incident management, you can integrate it with ThousandEyes to automate the alert process. Now let's look at alert baselining. Alert baselining provides users with the capability to create alert rules that accurately account for natural variations in test data. By utilizing metrics such as standard deviation, percentage change, or absolute values, you can configure alert rules that notify on baseline violations.
In practical terms, this means you can create an alert rule, for example, to receive a notification when the response time exceeds 30% above the baseline value, calculated over a 24-hour period. This feature allows users to create a single rule that applies to multiple locations within dynamic performance levels. Please be aware that dynamic baselines are currently only available for Cloud and Enterprise Agent tests.
ThousandEyes also offers alert suppression windows, which allow you to temporarily disable alert notifications for tests during specific periods, like planned maintenance windows or expected outages. You have the option to set up a one-time suppression window or schedule it to be reoccurring. During these windows, data will continue to be collected, but alerts will not be triggered.
Within the alert list menu, there are two tabs, Active Alerts and Alert History. Active Alerts shows all the active alerts. Alert History shows all the alerts within the last 90 days. You can also search within a specified time period. Now, let's create an example alert. On this page, there are different tabs, like Cloud and Enterprise Agents, Endpoint Agents, and BGP Routing. All the alerts you can use are listed in these sections.
Let's focus on the Cloud and Enterprise Agent section. Here, there are preconfigured alerts called default rules. They're designed specifically for tests involving Cloud and Enterprise Agents. For instance, look at the Default HTTP alert. This alert is automatically triggered when a test encounters an error. However, this alert might not be enough to really understand performance issues. To address this, let's make a new alert that can spot performance problems.
First, click on the Add New Alert button. This new alert will be of the Network type and Agent to Server. When you choose this type, you'll see a list of test types that this alert can work with. Since we're interested in performance, we'll measure two things, latency and packet loss in the HTTP server test that we set up earlier. We'll give this alert a name, SAP Concur, Loss and Latency 30%. Good naming is important so that administrators can understand what's going on just by reading the name. This helps them save time when responding to issues.
Now pick the test that this alert will be tied to. And then choose the agents involved. I'll set the severity to Major. This way, the person receiving the alert will know to take action quickly. As the name suggests, we're aiming for a 30% loss in data and latency. It's easy to calculate 30% packet loss, but what does 30% increase in latency mean? That's where the alert baseline comes in.
I want to get an alert when the loss is more than 30% or when latency is 30% higher than usual, considering the past 24 hours. So the conditions are packet loss should be at least 30% and the latency should be dynamically 30% higher.
By setting it to Any, I'll receive a notification if either of these conditions is met. Sometimes we see random issues that occur briefly. To avoid unnecessary alerts for those cases, we can set the condition that the issue must last for at least 10 minutes within a 20 minute time frame.
Now let's discuss how you can receive this alert. Navigate to the Notifications tab. By default, email notification is selected. In the default email notification method, we can include customized text to be sent with the alert. For instance, if a customer configures an alert for performance, we can include specific instructions for engineers to follow within the alert email. This could involve pointing out critical devices and the monitoring tools they should check upon receiving the alert. Using this text feature, we have the ability to establish our own incident management process.
Apart from email alerts, the ThousandEyes platform also supports webhooks. This involves providing URL and authentication to connect the alert with your webhook service. You also have options for third-party integrations. Choices include Slack, PagerDuty, and ServiceNow. These integrations expand the ways in which we can receive and manage alerts. That wraps up our video on alerts. Thanks for watching.