Automating Azure Instrumentation and Monitoring – Part 5: Log Alerts
3.9 (77.14%) 7 vote[s]

In the previous part of this series, we looked at the basic structure of Azure Monitor alerts, and then specifically at metric alerts. In this part we will consider other types of alert that Azure Monitor can emit. We will first discuss application log alerts – sometimes simply called log alerts – which let us be notified about important data emitted into our application logs. Next we will discuss activity log alerts, which notify us when events happen within Azure itself. These include service health alerts, which Azure emits when there are issues with a service.

This post is part of a series:

    • Part 1 provides an introduction to the series by describing why we should instrument our systems, outlines some of the major tools that Azure provides such as Azure Monitor, and argues why we should be adopting an ‘infrastructure as code’ mindset for our instrumentation and monitoring components.

    • Part 2 describes Azure Application Insights, including its proactive detection and alert features. It also outlines a pattern for deploying instrumentation components based on the requirements we might typically have for different environments, from short-lived development and test environments through to production.

    • Part 3 discusses how to publish custom metrics, both through Application Insights and to Azure Monitor. Custom metrics let us enrich the data that is available to our instrumentation components.

    • Part 4 covers the basics of alerts and metric alerts. Azure Monitor’s powerful alerting system is a big topic, and in this part we’ll discuss how it works overall, as well as how to get alerts for built-in and custom metrics.

    • Part 5 (this post) covers log alerts and resource health alerts, two other major types of alerts that Azure Monitor provides. Log alerts let us alert on information coming into Application Insights logs and Log Analytics workspaces, while resource health alerts us when Azure itself is having an issue that may result in downtime or degraded performance.

    • Part 6 (coming soon) describes dashboards. The Azure Portal has a great dashboard UI, and our instrumentation data can be made available as charts. Dashboards are also possible to automate, and I’ll show a few tips and tricks I’ve learned when doing this.

    • Part 7 (coming soon) covers availability tests, which let us proactively monitor our web applications for potential outages. We’ll discuss deploying and automating both single-step (ping) and multi-step availability tests.

    • Part 8 (coming soon) describes autoscale. While this isn’t exactly instrumentation in and of itself, autoscale is built on much of the same data used to drive alerts and dashboards, and autoscale rules can be automated as well.

    • Finally, part 9 (coming soon) covers exporting data to other systems. Azure Monitor metrics and log data can be automatically exported, as can Application Insights data, and the export rules can be exported and used from automation scripts.

Application Log Alerts

Azure supports application-level logging into two main destinations: Application Insights (which we discussed in part 2 of this series) and Azure Monitor’s own log management system. Both of these services receive log entries, store and index them within a few minutes, and allow for interactive querying through a powerful syntax called KQL. Additionally, we can create scheduled log query alert rules that run on the log data.

Note: Microsoft recently announced that they have renamed the service previously known as Log Analytics to Azure Monitor logs. This is a sensible change, in my opinion, since it reflects the fact that logs are just another piece of data within Azure Monitor.

Scheduled log query alert rules are relatively simple: at a frequency that we specify, they run a defined query and then look at the result. If the result matches criteria that we have specified then an alert is fired.

Like metric alert rules, scheduled log alert rules specify the conditions under which an alert should fire, but they don’t specify the process by which a human or system should be notified. Action groups, described in detail in part 4 of this series, fill that role.

Important: Like metric alerts, log alerts cost money – and there is no free quota provided for log alerts currently. Be aware of this when you create test alert rules!

There are several key pieces of a scheduled log query alert rule:

  • Data source is a reference to the location that stores the logs, such as an Application Insights instance. This is similar to the scopes property in metric alerts. Note that in ARM templates for log alerts, we specify the data source twice – once as a tag, and once as a property of the alert rule resource. Also, note that we can perform cross-workspace queries (for example, joining data from Application Insights with data in Azure Monitor logs); when we do so, we need to specify the full list of data sources we’re querying in the authorizedResources property.
  • Query is a KQL query that should be executed on a regular basis.
  • Query type indicates how the results of the query should be interpreted. For example, if we want to count the number of results and compare it to a threshold, we would use the ResultCount query type.
  • Trigger specifies the critical threshold for the query results. This includes both a threshold value and a comparison operator.
  • Schedule specifies how frequently the log query should run, and the time window that the log query should consider. Note that more frequent executions result in a higher cost.
  • Severity is the importance of the alert rule, which (as described in part 4 of this series) may be helpful information to whomever is responding to an alert from this rule so that they can understand its importance.
  • Actions are the action groups that should be invoked when an alert is fired.
  • Metadata includes a name and description of the alert rule.

Full documentation is available within the ARM API documentation site. There are some idiosyncrasies with these ARM templates, including the use of a mandatory tag and the fact that the enabledproperty is a string rather than a boolean value, so I suggest copying a known working example and modifying it incrementally.

Example

I recently worked on an application that had intermittent errors being logged into Application Insights. We quickly realised that the errors were logged by Windows Communication Foundation within our application. These errors indicated a problem that the development team needed to address, so in order to monitor the situation we configured an alert rule as follows:

  • Data source was the Application Insights instance for the application.
  • Query was the following KQL string: exceptions | where type matches regex 'System.ServiceModel.*'. This looked for all data within the exceptions index that contained a type field with the term System.ServiceModel inside it, using a regular expression to perform the match. (KQL queries can be significantly more complex than this, if you need them to be!)
  • Query type was ResultCount, since we were interested in monitoring the number of log entries matching the query.
  • Trigger was set to greater than and 0 for the operator and threshold, respectively.
  • Schedule was set to evaluate the query every five minutes, and to look back at the last five minutes, which meant that we have round-the-clock monitoring on this rule.
  • Severity was 3, since we considered this to be a warning-level event but not an immediate emergency.
  • Action was set to an action group that sent an email to the development team.

The following ARM template creates this alert rule using an Application Insights instance:

Note that this ARM template also creates an action group with an email action, but of course you can have whatever action groups you want; you can also refer to shared action groups in other resource groups.

Of course, if you have data within Azure Monitor logs (previously Log Analytics workspaces) then the same process applies there, but with a different data source.

Metric Log Alerts

There is also a special scenario available: when certain log data gets ingested into Azure Monitor logs workspaces, it is made available for metric alerting. These alerts are for data including performance counters from virtual machines and certain other types of well-known log data. In these cases, logs are used to transmit the data but it is fundamentally a metric, so this feature of Azure Monitor exposes it as such. More information on this feature, including an example ARM template, is available here.

Activity Log Alerts

Azure’s activity log is populated by Azure automatically. It includes a number of different types of data, including resource-level operations (e.g. resource creation, modification, and deletion), service health data (e.g. when a maintenance event is planned for a virtual machine), and a variety of other types of log data that can be specific to individual resource types. More detail about the data captured by the activity log is available on the Azure documentation pages.

Activity log alert rules can be created through ARM templates using the resource type Microsoft.Insights/activityLogAlerts. The resource properties are similar to those on metric alert rules, but with a few differences. Here are the major properties:

  • Scope is the resource that we want to monitor the activity log entries from, and alert on. Note that we can provide a resource group or even a subscription reference here, and all of the resources within those scopes will be covered.
  • Condition is the set of specific rules that should be evaluated. These are a set of boolean rules to evaluate log entries’ categories, resource types, and other key properties. Importantly, you must always provide a category filter.
  • Actions are references to the action group (or groups) that should be invoked when an alert is fired.

Unlike application log queries, we don’t specify a KQL query or a time window; we instead have a simpler set of boolean criteria to use to filter events and get alerts.

By using activity log alerts, we can set up rules like alert me whenever a resource group is deleted within this Azure subscription and alert me whenever a failed ARM template deployment happens within this resource group. Here is an example ARM templates that covers both of these scenarios:

Note that activity log alert resources should be created in the global location.

More information on activity log alerts is available here; more detail on the ARM template syntax and other ways of automating their creation is available here; and even more detail on the ARM resource properties is available here.

Service Health Alerts

Azure provides service health events to advise of expected as well as unexpected issues with Azure services. For example, when virtual machines have a maintenance window scheduled, Azure publishes a service health event to notify you of this fact. Similarly, if Azure had a problem with a particular service (e.g. Azure Storage), it would typically publish a service health event to advise of the incident details, often both during the incident and after the incident has been resolved.

Service health events are published into the activity log. A great deal of information is available about these events, and as a result, activity log alert rules can be used to monitor for service health events as well. Simply use the ServiceHealth category, and then the properties available on service health events, to filter them as appropriate. An example ARM template is available within the Microsoft documentation for service health alerts.

Resource Health Alerts

Azure also helps to filter the relevant service health events into another category of activity log event, using the ResourceHealth category. While service health events provide information about planned maintenance and incidents that may affect entire Azure services, resource health events are specific to your particular resource. They essentially filter and collapse service health events into a single health status for a given resource. Once again, Microsoft provide an example ARM template within their documentation.

Summary

In this post we have discussed the different types of log alerts that Azure Monitor provides. Scheduled log query alert rules let us define queries that we should run on the structured, semi-structured, or unstructured logs that our applications emit, and then have these queries run automatically and alert us when their results show particular signals that we need to pay attention to. Activity log alert rules let us monitor data that is emitted by Azure itself, including by Azure Resource Manager and by Azure’s service health monitoring systems.

We have now discussed the key components of Azure Monitor’s alerting system. The alerting system works across four main resource types: action groups, and then the different types of alert rules (metric, scheduled log query, and activity log). By using all of these components together, we can create robust monitoring solutions that make use of data emitted by Azure automatically, and by custom log data and metrics that we report into Azure Monitor ourselves.

In the next post of this series we will discuss another important aspect of interacting with data that has been sent into Azure Monitor: viewing and manipulating data using dashboards.

Category:
Application Development and Integration, Azure Infrastructure, Azure Platform
Tags:
, , , , , ,

Join the conversation! 1 Comment

  1. is it possible to integrate Azure monitor with a third party ticketing tool like remedy so as to raise tickets for alerts (I know Azure supports and have inbuilt feature for service now)?

    Reply

Leave a Reply