Automating Azure Instrumentation and Monitoring – Part 5: Log Alerts

In the previous part of this series, we looked at the basic structure of Azure Monitor alerts, and then specifically at metric alerts. In this part we will consider other types of alert that Azure Monitor can emit. We will first discuss application log alerts – sometimes simply called log alerts – which let us be notified about important data emitted into our application logs. Next we will discuss activity log alerts, which notify us when events happen within Azure itself. These include service health alerts, which Azure emits when there are issues with a service.

This post is part of a series:

    • Part 1 provides an introduction to the series by describing why we should instrument our systems, outlines some of the major tools that Azure provides such as Azure Monitor, and argues why we should be adopting an ‘infrastructure as code’ mindset for our instrumentation and monitoring components.

    • Part 2 describes Azure Application Insights, including its proactive detection and alert features. It also outlines a pattern for deploying instrumentation components based on the requirements we might typically have for different environments, from short-lived development and test environments through to production.

    • Part 3 discusses how to publish custom metrics, both through Application Insights and to Azure Monitor. Custom metrics let us enrich the data that is available to our instrumentation components.

    • Part 4 covers the basics of alerts and metric alerts. Azure Monitor’s powerful alerting system is a big topic, and in this part we’ll discuss how it works overall, as well as how to get alerts for built-in and custom metrics.

    • Part 5 (this post) covers log alerts and resource health alerts, two other major types of alerts that Azure Monitor provides. Log alerts let us alert on information coming into Application Insights logs and Log Analytics workspaces, while resource health alerts us when Azure itself is having an issue that may result in downtime or degraded performance.

    • Part 6 (coming soon) describes dashboards. The Azure Portal has a great dashboard UI, and our instrumentation data can be made available as charts. Dashboards are also possible to automate, and I’ll show a few tips and tricks I’ve learned when doing this.

    • Part 7 (coming soon) covers availability tests, which let us proactively monitor our web applications for potential outages. We’ll discuss deploying and automating both single-step (ping) and multi-step availability tests.

    • Part 8 (coming soon) describes autoscale. While this isn’t exactly instrumentation in and of itself, autoscale is built on much of the same data used to drive alerts and dashboards, and autoscale rules can be automated as well.

    • Finally, part 9 (coming soon) covers exporting data to other systems. Azure Monitor metrics and log data can be automatically exported, as can Application Insights data, and the export rules can be exported and used from automation scripts.

Application Log Alerts

Azure supports application-level logging into two main destinations: Application Insights (which we discussed in part 2 of this series) and Azure Monitor’s own log management system. Both of these services receive log entries, store and index them within a few minutes, and allow for interactive querying through a powerful syntax called KQL. Additionally, we can create scheduled log query alert rules that run on the log data.

Note: Microsoft recently announced that they have renamed the service previously known as Log Analytics to Azure Monitor logs. This is a sensible change, in my opinion, since it reflects the fact that logs are just another piece of data within Azure Monitor.

Scheduled log query alert rules are relatively simple: at a frequency that we specify, they run a defined query and then look at the result. If the result matches criteria that we have specified then an alert is fired.

Like metric alert rules, scheduled log alert rules specify the conditions under which an alert should fire, but they don’t specify the process by which a human or system should be notified. Action groups, described in detail in part 4 of this series, fill that role.

Important: Like metric alerts, log alerts cost money – and there is no free quota provided for log alerts currently. Be aware of this when you create test alert rules!

There are several key pieces of a scheduled log query alert rule:

  • Data source is a reference to the location that stores the logs, such as an Application Insights instance. This is similar to the scopes property in metric alerts. Note that in ARM templates for log alerts, we specify the data source twice – once as a tag, and once as a property of the alert rule resource. Also, note that we can perform cross-workspace queries (for example, joining data from Application Insights with data in Azure Monitor logs); when we do so, we need to specify the full list of data sources we’re querying in the authorizedResources property.
  • Query is a KQL query that should be executed on a regular basis.
  • Query type indicates how the results of the query should be interpreted. For example, if we want to count the number of results and compare it to a threshold, we would use the ResultCount query type.
  • Trigger specifies the critical threshold for the query results. This includes both a threshold value and a comparison operator.
  • Schedule specifies how frequently the log query should run, and the time window that the log query should consider. Note that more frequent executions result in a higher cost.
  • Severity is the importance of the alert rule, which (as described in part 4 of this series) may be helpful information to whomever is responding to an alert from this rule so that they can understand its importance.
  • Actions are the action groups that should be invoked when an alert is fired.
  • Metadata includes a name and description of the alert rule.

Full documentation is available within the ARM API documentation site. There are some idiosyncrasies with these ARM templates, including the use of a mandatory tag and the fact that the enabledproperty is a string rather than a boolean value, so I suggest copying a known working example and modifying it incrementally.

Example

I recently worked on an application that had intermittent errors being logged into Application Insights. We quickly realised that the errors were logged by Windows Communication Foundation within our application. These errors indicated a problem that the development team needed to address, so in order to monitor the situation we configured an alert rule as follows:

  • Data source was the Application Insights instance for the application.
  • Query was the following KQL string: exceptions | where type matches regex 'System.ServiceModel.*'. This looked for all data within the exceptions index that contained a type field with the term System.ServiceModel inside it, using a regular expression to perform the match. (KQL queries can be significantly more complex than this, if you need them to be!)
  • Query type was ResultCount, since we were interested in monitoring the number of log entries matching the query.
  • Trigger was set to greater than and 0 for the operator and threshold, respectively.
  • Schedule was set to evaluate the query every five minutes, and to look back at the last five minutes, which meant that we have round-the-clock monitoring on this rule.
  • Severity was 3, since we considered this to be a warning-level event but not an immediate emergency.
  • Action was set to an action group that sent an email to the development team.

The following ARM template creates this alert rule using an Application Insights instance:

Note that this ARM template also creates an action group with an email action, but of course you can have whatever action groups you want; you can also refer to shared action groups in other resource groups.

Of course, if you have data within Azure Monitor logs (previously Log Analytics workspaces) then the same process applies there, but with a different data source.

Metric Log Alerts

There is also a special scenario available: when certain log data gets ingested into Azure Monitor logs workspaces, it is made available for metric alerting. These alerts are for data including performance counters from virtual machines and certain other types of well-known log data. In these cases, logs are used to transmit the data but it is fundamentally a metric, so this feature of Azure Monitor exposes it as such. More information on this feature, including an example ARM template, is available here.

Activity Log Alerts

Azure’s activity log is populated by Azure automatically. It includes a number of different types of data, including resource-level operations (e.g. resource creation, modification, and deletion), service health data (e.g. when a maintenance event is planned for a virtual machine), and a variety of other types of log data that can be specific to individual resource types. More detail about the data captured by the activity log is available on the Azure documentation pages.

Activity log alert rules can be created through ARM templates using the resource type Microsoft.Insights/activityLogAlerts. The resource properties are similar to those on metric alert rules, but with a few differences. Here are the major properties:

  • Scope is the resource that we want to monitor the activity log entries from, and alert on. Note that we can provide a resource group or even a subscription reference here, and all of the resources within those scopes will be covered.
  • Condition is the set of specific rules that should be evaluated. These are a set of boolean rules to evaluate log entries’ categories, resource types, and other key properties. Importantly, you must always provide a category filter.
  • Actions are references to the action group (or groups) that should be invoked when an alert is fired.

Unlike application log queries, we don’t specify a KQL query or a time window; we instead have a simpler set of boolean criteria to use to filter events and get alerts.

By using activity log alerts, we can set up rules like alert me whenever a resource group is deleted within this Azure subscription and alert me whenever a failed ARM template deployment happens within this resource group. Here is an example ARM templates that covers both of these scenarios:

Note that activity log alert resources should be created in the global location.

More information on activity log alerts is available here; more detail on the ARM template syntax and other ways of automating their creation is available here; and even more detail on the ARM resource properties is available here.

Service Health Alerts

Azure provides service health events to advise of expected as well as unexpected issues with Azure services. For example, when virtual machines have a maintenance window scheduled, Azure publishes a service health event to notify you of this fact. Similarly, if Azure had a problem with a particular service (e.g. Azure Storage), it would typically publish a service health event to advise of the incident details, often both during the incident and after the incident has been resolved.

Service health events are published into the activity log. A great deal of information is available about these events, and as a result, activity log alert rules can be used to monitor for service health events as well. Simply use the ServiceHealth category, and then the properties available on service health events, to filter them as appropriate. An example ARM template is available within the Microsoft documentation for service health alerts.

Resource Health Alerts

Azure also helps to filter the relevant service health events into another category of activity log event, using the ResourceHealth category. While service health events provide information about planned maintenance and incidents that may affect entire Azure services, resource health events are specific to your particular resource. They essentially filter and collapse service health events into a single health status for a given resource. Once again, Microsoft provide an example ARM template within their documentation.

Summary

In this post we have discussed the different types of log alerts that Azure Monitor provides. Scheduled log query alert rules let us define queries that we should run on the structured, semi-structured, or unstructured logs that our applications emit, and then have these queries run automatically and alert us when their results show particular signals that we need to pay attention to. Activity log alert rules let us monitor data that is emitted by Azure itself, including by Azure Resource Manager and by Azure’s service health monitoring systems.

We have now discussed the key components of Azure Monitor’s alerting system. The alerting system works across four main resource types: action groups, and then the different types of alert rules (metric, scheduled log query, and activity log). By using all of these components together, we can create robust monitoring solutions that make use of data emitted by Azure automatically, and by custom log data and metrics that we report into Azure Monitor ourselves.

In the next post of this series we will discuss another important aspect of interacting with data that has been sent into Azure Monitor: viewing and manipulating data using dashboards.

Automating Azure Instrumentation and Monitoring – Part 4: Metric Alerts

One of the most important features of Azure Monitor is its ability to send alerts when something interesting happens – in other words, when our telemetry meets some criteria we have told Azure Monitor that we’re interested in. We might have alerts that indicate when our application is down, or when it’s getting an unusually high amount of traffic, or when the response time or other performance metrics aren’t within the normal range. We can also have alerts based on the contents of log messages, and on the health status of Azure resources as reported by Azure itself. In this post, we’ll look at how alerts work within Azure Monitor and will see how these can be automated using ARM templates. This post will focus on the general workings of the alerts system, including action groups, and on metric alerts; part 5 (coming soon) will look at log alerts and resource health alerts.

This post is part of a series:

    • Part 1 provides an introduction to the series by describing why we should instrument our systems, outlines some of the major tools that Azure provides such as Azure Monitor, and argues why we should be adopting an ‘infrastructure as code’ mindset for our instrumentation and monitoring components.

    • Part 2 describes Azure Application Insights, including its proactive detection and alert features. It also outlines a pattern for deploying instrumentation components based on the requirements we might typically have for different environments, from short-lived development and test environments through to production.

    • Part 3 discusses how to publish custom metrics, both through Application Insights and to Azure Monitor. Custom metrics let us enrich the data that is available to our instrumentation components.

    • Part 4 (this post) covers the basics of alerts and metric alerts. Azure Monitor’s powerful alerting system is a big topic, and in this part we’ll discuss how it works overall, as well as how to get alerts for built-in and custom metrics.

    • Part 5 (coming soon) covers log alerts and resource health alerts, two other major types of alerts that Azure Monitor provides. Log alerts let us alert on information coming into Application Insights logs and Log Analytics workspaces, while resource health alerts us when Azure itself is having an issue that may result in downtime or degraded performance.

    • Part 6 (coming soon) describes dashboards. The Azure Portal has a great dashboard UI, and our instrumentation data can be made available as charts. Dashboards are also possible to automate, and I’ll show a few tips and tricks I’ve learned when doing this.

    • Part 7 (coming soon) covers availability tests, which let us proactively monitor our web applications for potential outages. We’ll discuss deploying and automating both single-step (ping) and multi-step availability tests.

    • Part 8 (coming soon) describes autoscale. While this isn’t exactly instrumentation in and of itself, autoscale is built on much of the same data used to drive alerts and dashboards, and autoscale rules can be automated as well.

    • Finally, part 9 (coming soon) covers exporting data to other systems. Azure Monitor metrics and log data can be automatically exported, as can Application Insights data, and the export rules can be exported and used from automation scripts.

What Are Alerts?

Alerts are described in detail on the Azure Monitor documentation, and I won’t re-hash the entire page here. Here is a quick summary, though.

An alert rule defines the situations under which an alert should fire. For example, an alert rule might be something like when the average CPU utilisation goes above 80% over the last hour, or when the number of requests that get responses with an HTTP 5xx error code goes above 3 in the last 15 minutes. An alert is a single instance in which the alert rule fired. We tell Azure Monitor what alert rules we want to create, and Azure Monitor creates alerts and sends them out.

Alert rules have three logical components:

    • Target resource: the Azure resource that should be monitored for this alert. For example, this might be an app service, a Cosmos DB account, or an Application Insights instance.
    • Rule: the rule that should be applied when determining whether to fire an alert for the resource. For example, this might be a rule like when average CPU usage is greater than 50% within the last 5 minutes, or when a log message is written with a level of Warning. Rules include a number of sub-properties, and often include a time window or schedule that should be used to evaluate the alert rule.
    • Action: the actions that should be performed when the alert has fired. For example, this might be email admin@example.com or invoke a webhook at https://example.com/alert. Azure Monitor provides a number of action types that can be invoked, which we’ll discuss below.

There are also other pieces of metadata that we can set when we create alert rules, including the alert rule name, description, and severity. Severity is a useful piece of metadata that will be propagated to any alerts that fire from this alert rule, and allows for whoever is responding to understand how important the alert is likely to be, and to prioritise their list of alerts so that they deal with the most important alerts first.

Classic Alerts

Azure Monitor currently has two types of alerts. Classic alerts are the original alert type supported by Azure Monitor since its inception, and can be contrasted with the newer alerts – which, confusingly, don’t seem to have a name, but which I’ll refer to as newer alerts for the sake of this post.

There are many differences between classic and newer alerts. One such difference is that in classic alerts, actions and rules are mixed into a single ‘alert’ resource, while in newer alerts, actions and rules are separate resources (as described below in more detail). A second difference is that as Azure migrates from classic to newer alerts, some Azure resource types only support classic alerts, although these are all being migrated across to newer alerts.

Microsoft recently announced that classic alerts will be retired in June 2019, so I won’t spend a lot of time discussing them here, although if you need to create a classic alert with an ARM template before June 2019, you can use this documentation page as a reference.

All of the rest of this discussion will focus on newer alerts.

Alert Action Groups

A key component of Azure Monitor’s alert system is action groups, which define how an alert should be handled. Importantly, action groups are independent of the alert rule that triggered them. An alert rule defines when and why an alert should be fired, while an action group defines how the alert should be sent out to interested parties. For example, an action group can send an email to a specified email address, send an SMS notification, invoke a webhook, trigger a Logic App, or perform a number of other actions. A single action group can perform one or several of these actions.

Action groups are Azure Resource Manager resources in their own right, and alert rules then refer to them. This means we can have shared action groups that work across multiple alerts, potentially spread across multiple applications or multiple teams. We can also create specific action groups for defined purposes. For example, in an enterprise application you might have a set of action groups like this:

Action Group NameResource GroupActionsNotes
CreateEnterpriseIssueShared-OpsTeamInvoke a webhook to create issue in enterprise issue tracking system.This might be used for high priority issues that need immediate, 24×7 attention. It will notify your organisation’s central operations team.
SendSmsToTeamLeadMyApplicationSend an SMS to the development team lead.This might be used for high priority issues that also need 24×7 attention. It will notify the dev team lead.
EmailDevelopmentTeamMyApplicationSend an email to the development team’s shared email alias.This might be used to ensure the development team is aware of all production issues, including lower-priority issues that only need attention during business hours.

Of course, these are just examples; you can set up any action groups that make sense for your application, team, or company.

Automating Action Group Creation

Action groups can be created and updated using ARM templates, using the Microsoft.Insights/actionGroups resource type. The schema is fairly straightforward, but one point to consider is the groupShortName property. The short name is used in several places throughout Azure Monitor, but importantly it is used to identify the action group on email and SMS message alerts that Azure Monitor sends. If you have multiple teams, multiple applications, or even just multiple alert groups, it’s important to choose a meaningful short name that will make sense to someone reading the alert. I find it helpful to put myself in the mind of the person (likely me!) who will be woken at 3am to a terse SMS informing them that something has happened; they will be half asleep while trying to make sense of the alert that they have received. Choosing an appropriate action group short name may help save them several minutes of troubleshooting time, reducing the time to diagnosis (and the time before they can return to bed). Unfortunately these short names must be 12 characters or fewer, so it’s not always easy to find a good name to use.

With this in mind, here is an example ARM template that creates the three action groups listed above:

Note that this will create all three action groups in the same resource group, rather than using separate resource groups for the shared and application-specific action groups.

Once the action groups have been created, any SMS and email recipients will receive a confirmation message to let them know they are now in the action group. They can also unsubscribe from the action group if they choose. If you use a group email alias, it’s important to remember that if one recipient unsubscribes then the whole email address action will be disabled for that alert, and nobody on the email distribution list will get those alerts anymore.

Metric Alerts

Now that we know how to create action groups that are ready to receive alerts and route them to the relevant people and places, let’s look at how we create an alert based on the metrics that Azure Monitor has recorded for our system.

Important: Metric alerts are not free of charge, although there is a small free quota you get. Make sure you remove any test alert rules once you’re done, and take a look at the pricing information for more detail.

A metric alert rule has a number of important properties:

    • Scope is the resource that has the metrics that we want to monitor and alert on.
    • Evaluation frequency is how often Azure Monitor should check the resource to see if it meets the criteria. This is specified as an ISO 8601 period – for example, PT5M means check this alert every 5 minutes.
    • Window size is how far back in time Azure Monitor should look when it checks the criteria. This is also specified as an ISO 8601 period – for example, PT1H means when running this alert, look at the metric history for the last 1 hour. This can be between 5 minutes and 24 hours.
    • Criteria are the specific rules that should be evaluated. There is a sophisticated set of functionality available when specifying criteria, but commonly this will be something like (for an App Service) look at the number of requests that resulted in a 5xx status code response, and alert me if the count is greater than 3 or (for a Cosmos DB database) look at the number of requests where the StatusCode dimension was set to the value 429 (representing a throttled request), and alert me if the count is greater than 1.
    • Actions are references to the action group (or groups) that should be invoked when an alert is fired.

Each of these properties can be set within an ARM template using the resource type Microsoft.Insights/metricAlerts. Let’s discuss a few of these in more detail.

Scope

As we know from earlier in this series, there are three main ways that metrics get into Azure Monitor:

    • Built-in metrics, which are published by Azure itself.
    • Custom resource metrics, which are published by our applications and are attached to Azure resources.
    • Application Insights allows for custom metrics that are also published by our applications, but are maintained within Application Insights rather than tied to a specific Azure resource.

All three of these metric types can have alerts triggered from them. In the case of built-in and custom resource metrics, we will use the Azure resource itself as the scope of the metric alert. For Application Insights, we use the Application Insights resource (i.e. the resource of type Microsoft.Insights/components) as the scope.

Note that Microsoft has recently announced a preview capability of monitoring multiple resources in a single metric alert rule. This currently only works with virtual machines, and as it’s such a narrow use case, I won’t discuss it here. However, keep in mind that the scopes property is specified as an array because of this feature.

Criteria

A criterion is a specification of the conditions under which the alert should fire. Criteria have the following sub-properties:

    • Name: a criterion can have a friendly name specified to help understand what caused an alert to fire.
    • Metric name and namespace: the name of the metric that was published, and if it’s a custom metric, the namespace. For more information on metric namespaces see part 3 of this seriesA list of built-in metrics published by Azure services is available here.
    • Dimensions: if the metric has dimensions associated with it, we can filter the metrics to only consider certain dimension values. Dimension values can be included or excluded.
    • Time aggregation: the way in which the metric should be aggregated – e.g. counted, summed, or have the maximum/minimum values considered.
    • Operator: the comparison operator (e.g. greater than, less than) that should be used when comparing the aggregated metric value to the threshold.
    • Threshold: the critical value at which the aggregated metric should trigger the alert to fire.

These properties can be quite abstract, so let’s consider a couple of examples.

First, let’s consider an example for Cosmos DB. We might have a business rule that says whenever we see more than one throttled request, fire an alert. In this example:

    • Metric name would be TotalRequests, since that is the name of the metric published by Cosmos DB. There is no namespace since this is a built-in alert. Note that, by default, TotalRequests is the count of all requests and not just throttled requests, so…
    • Dimension would be set to filter the StatusCode dimension to only include the value 429, since 429 represents a throttled request.
    • Operator would be GreaterThan, since we are interested in knowing when we see more than a single throttled request.
    • Threshold would be 1, since we want to know whether we received more than one throttled request.
    • Time aggregation would be Maximum. The TotalRequests metric is a count-based metric (i.e. each metric raw value represents the total number of requests for a given period of time), and so we want to look at the maximum value of the metric within the time window that we are considering.

Second, let’s consider an example for App Services. We might have a business rule that says whenever our application returns more than three responses with a 5xx response code, fire an alert. In this example:

    • Metric name would be Http5xx, since that is the name of the metric published by App Services. Once again, there is no namespace.
    • Dimension would be omitted. App Services publishes the Http5xx metric as a separate metric rather than having a TotalRequests metric with dimensions for status codes like Cosmos DB. (Yes, this is inconsistent!)
    • Operator would again be GreaterThan.
    • Threshold would be 3.
    • Time aggregation would again be Maximum.

Note that a single metric alert can have one or more criteria. The odata.type property of the criteria property can be set to different values depending on whether we have a single criterion (in which case use Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria) or multiple (Microsoft.Azure.Monitor.MultipleResourceMultipleMetricCriteria). At the time of writing, if we use multiple criteria then all of the criteria must be met for the alert rule to fire.

Static and Dynamic Thresholds

Azure Monitor recently added a new preview feature called dynamic thresholds. When we use dynamic thresholds then rather than specifying the metric thresholds ourselves, we instead let Azure Monitor watch the metric and learn its normal values, and then alert us if it notices a change. The feature is currently in preview, so I won’t discuss it in a lot of detail here, but there are example ARM templates available if you want to explore this.

Example ARM Templates

Let’s look at a couple of ARM templates to create the metric alert rules we discussed above. Each template also creates an action group with an email action, but of course you can have whatever action groups you want; you can also refer to shared action groups in other resource groups.

First, here is the ARM template for the Cosmos DB alert rule (lines 54-99), which uses a dimension to filter the metrics (lines 74-81) like we discussed above:

Second, here is the ARM template for the App Services alert rule (lines 77 to 112):

Note: when I tried to execute the second ARM template, I sometimes found it would fail the first time around, but re-executing it worked. This seems to just be one of those weird things with ARM templates, unfortunately.

Summary

Azure’s built-in metrics provide a huge amount of visibility into the operation of our system components, and of course we can enrich these with our own custom metrics (see part 3 of this series). Once the data is available to Azure Monitor, Azure Monitor can alert us based on whatever criteria we want to establish. The definitions of these metric alert rules is highly automatable using ARM templates, as is the definition of action groups to specify what should happen when an alert is fired.

In the next part of this series we will look at alerts based on log data.

Automating Azure Instrumentation and Monitoring – Part 3: Custom Metrics

One of the core data types that Azure Monitor works with is metrics – numerical pieces of data that represent the state of an Azure resource or of an application component at a specific point in time. Azure publishes built-in metrics for almost all Azure services, and these metrics are available for querying interactively as well as for use within alerts and other systems. In addition to the Azure-published metrics, we can also publish our own custom metrics. In this post we’ll discuss how to do this using both Azure Monitor’s recently announced support for custom metrics, and with Application Insights’ custom metrics features. We’ll start by looking at what metrics are and how they work.

This post is part of a series:

  • Part 1 provides an introduction to the series by describing why we should instrument our systems, outlines some of the major tools that Azure provides such as Azure Monitor, and argues why we should be adopting an ‘infrastructure as code’ mindset for our instrumentation and monitoring components.

  • Part 2 describes Azure Application Insights, including its proactive detection and alert features. It also outlines a pattern for deploying instrumentation components based on the requirements we might typically have for different environments, from short-lived development and test environments through to production.

  • Part 3 (this post) discusses how to publish custom metrics, both through Application Insights and to Azure Monitor. Custom metrics let us enrich the data that is available to our instrumentation components.

  • Part 4 covers the basics of alerts and metric alerts. Azure Monitor’s powerful alerting system is a big topic, and in this part we’ll discuss how it works overall, as well as how to get alerts for built-in and custom metrics.

  • Part 5 (coming soon) covers log alerts and resource health alerts, two other major types of alerts that Azure Monitor provides. Log alerts let us alert on information coming into Application Insights logs and Log Analytics workspaces, while resource health alerts us when Azure itself is having an issue that may result in downtime or degraded performance.

  • Part 6 (coming soon) describes dashboards. The Azure Portal has a great dashboard UI, and our instrumentation data can be made available as charts. Dashboards are also possible to automate, and I’ll show a few tips and tricks I’ve learned when doing this.

  • Part 7 (coming soon) covers availability tests, which let us proactively monitor our web applications for potential outages. We’ll discuss deploying and automating both single-step (ping) and multi-step availability tests.

  • Part 8 (coming soon) describes autoscale. While this isn’t exactly instrumentation in and of itself, autoscale is built on much of the same data used to drive alerts and dashboards, and autoscale rules can be automated as well.

  • Finally, part 9 (coming soon) covers exporting data to other systems. Azure Monitor metrics and log data can be automatically exported, as can Application Insights data, and the export rules can be exported and used from automation scripts.

What Are Metrics?

Metrics are pieces of numerical data. Each metric has both a value and a unit. Here are some example metrics:

ExampleValueUnit
12 requests per second12requests per second
54 gigabytes54gigabytes
7 queue messages7queue messages

Metrics can be captured either in their raw or aggregated form. An aggregated metric is a way of simplifying the metric across a given period of time. For example, consider a system that processes messages from a queue. We could count the number of messages processed by the system in two ways: we could adjust our count every time a message is processed, or we could check the number of messages on the queue every minute, and batch these into five-minute blocks. The latter is one example of an aggregated metric.

Because metrics are numerical in nature, they can be visualised in different ways. For example, a line chart might show the value of a metric over time.

Azure Monitor also supports adding dimensions to metrics. Dimensions are extra pieces of data that help to add context to a metric. For example, the Azure Monitor metric for the number of messages in a Service Bus namespace has the entity (queue or topic) name as a dimension. Queries and visualisations against this metric can then filter down to specific topics, can visualise each topic separately, or can roll up all topics/queues and show the total number of messages across the whole Service Bus namespace.

Azure Monitor Metrics

Azure Monitor currently has two metric systems.

  • Classic metrics were traditionally published by most Azure services. When we use the Azure Portal, the Metrics (Classic) page displays these metrics.
  • Near real-time metrics are the newer type of metrics, and Azure is moving to use this across all services. As their name suggests, these metrics get updated more frequently than classic metrics – where classic metrics might not appear for several minutes, near real-time metrics typically are available for querying within 1-2 minutes of being published, and sometimes much quicker than that. Additionally, near real-time metrics are the only metric type that supports dimensions; classic metrics do not. Custom metrics need to be published as near real-time metrics.

Over time, all Azure services will move to the near real-time metrics system. In the meantime, you can check whether a given Azure service is publishing to the classic or newer metric system by checking this page. In this post we’ll only be dealing with the newer (near real-time) metric system.

Custom Metrics

Almost all Azure services publish their own metrics in some form, although the usefulness and quality varies depending on the specific service. Core Azure services tend to have excellent support for metrics. Publishing of built-in metrics happens automatically and without any interaction on our part. The metrics are available for interactive querying through the portal and API, and for alerts and all of the other purposes we discussed in part 1 of this series.

There are some situations where the built-in metrics aren’t enough for our purposes. This commonly happens within our own applications. For example, if our application has components that process messages from a queue then it can be helpful to know how many messages are being processed per minute, how long each message takes to process, and how many messages are currently on the queues. These metrics can help us to understand the health of our system, to provision new workers to help to process messages more quickly, or to understand bottlenecks that our developers might need to investigate.

There are two ways that we can publish custom metrics into Azure Monitor.

  • Azure Monitor custom metrics, currently a preview service, provides an API for us to send metrics into Azure Monitor. We submit our metrics to an Azure resource, and the metrics are saved alongside the built-in metrics for that resource.
  • Application Insights also provides custom metrics. Our applications can create and publish metrics into Application Insights, and they are accessible by using Application Insights’ UI, and through the other places that we work with metrics. Although the core offering of publishing custom metrics into Application Insights is generally available, some specific features are in preview.

How do we choose which approach to use? Broadly speaking I’d generally suggest using Azure Monitor’s custom metrics API for publishing resource or infrastructure-level metrics – i.e. enriching the data that Azure itself publishes about a resource – and I’d suggest using Application Insights for publishing application-level metrics – i.e. metrics about our own application code.

Here’s a concrete example, again related to queue processing. If we have an application that processes queue messages, we’ll typically want instrumentation to understand how these queues and processors are behaving. If we’re using Service Bus queues or topics then we get a lot of instrumentation about our queues, including the number of messages that are currently on the queue. But if we’re using Azure Storage queues, we’re out of luck. Azure Storage queues don’t have the same metrics, and we don’t get the queue lengths from within Azure Monitor. This is an ideal use case for Azure Monitor’s custom metrics.

We may also want to understand how long it’s taking us to process each message – from the time it was submitted to the queue to the time it completed processing. This is often an important metric to ensure that our data is up-to-date and that users are having the best experience possible. Ultimately this comes down to how long our application is taking to perform its logic, and so this is an application-level concern and not an infrastructure-level concern. We’ll use Application Insights for this custom metric.

Let’s look at how we can write code to publish each of these metrics.

Publishing Custom Resource Metrics

In order to publish a custom resource metric we need to do the following:

  • Decide whether we will add dimensions.
  • Decide whether we will aggregate our metric’s value.
  • Authenticate and obtain an access token.
  • Send the metric data to the Azure Monitor metrics API.

Let’s look at each of these in turn, in the context of an example Azure Function app that we’ll use to send our custom metrics.

Adding Dimensions

As described above, dimensions let us add extra data to our metrics so that we can group and compare them. We can submit metrics to Azure Monitor with or without dimensions. If we want to include dimensions, we need to include two extra properties – dimNames specifies the names of the dimensions we want to add to the metric, and dimValues specifies the values of those dimensions. The order of the dimension names and values must match so that Azure Monitor can relate the value to its dimension name.

Aggregating Metrics

Metrics are typically queried in an aggregated form – for example, counting or averaging the values of metrics to get a picture of how things are going overall. When submitting custom metrics we can also choose to send our metric values in an aggregated form if we want. The main reasons we’d do this are:

  • To save cost. Azure Monitor custom metrics aren’t cheap when you use them at scale, and so pre-aggregating within our application means we don’t need to incur quite as high a cost since we aren’t sending as much raw data to Azure Monitor to ingest.
  • To reduce a very high volume of metrics. If we have a large number of metrics to report on, it will likely be much faster for us to send the aggregated metric to Azure Monitor rather than sending each individual metric.

However, it’s up to us – we can choose to send individual values if we want.

If we send aggregated metrics then we need to construct a JSON object to represent the metric as follows:

For example, let’s imagine we have recorded the following queue lengths (all times in UTC):

TimeLength
11:00am1087
11:01am1124
11:02am826
11:03am888
11:04am1201
11:05am1091

We might send the following pre-aggregated metrics in a single payload:

Azure Monitor would then be able to display the aggregated metrics for us when we query.

If we chose not to send the metrics in an aggregated form, we’d send the metrics across individual messages; here’s an example of the fourth message:

Security for Communicating with Azure Monitor

We need to obtain an access token when we want to communicate with Azure Monitor. When we use Azure Functions, we can make use of managed identities to simplify this process a lot. I won’t cover all the details of managed identities here, but the example ARM template for this post includes the creation and use of a managed identity. Once the function is created, it can use the following code to obtain a token that is valid for communication with Azure Monitor:

The second part of this process is authorising the function’s identity to write metrics to resources. This is done by using the standard Azure role-based access control system. The function’s identity needs to be granted the Monitoring Metrics Publisher role, which has been defined with the well-known role definition ID 3913510d-42f4-4e42-8a64-420c390055eb.

Sending Custom Metrics to Azure Monitor

Now we have our metric object and our access token ready, we can submit the metric object to Azure Monitor. The actual submission is fairly easy – we just perform a POST to a URL. However, the URL we submit to will be different depending on the resource’s location and resource ID, so we dynamically construct the URL as follows:

We might deploy into the West US 2 region, so an example URL might look like this: https://westus2.monitoring.azure.com/subscriptions/377f4fe1-340d-4c5d-86ae-ad795b5ac17d/resourceGroups/MyCustomMetrics/providers/Microsoft.Storage/storageAccounts/mystorageaccount/metrics

Currently Azure Monitor only supports a subset of Azure regions for custom metrics, but this list is likely to grow as the feature moves out of preview.

Here is the full C# Azure Function we use to send our custom metrics:

Testing our Function

I’ve provided an ARM template that you can deploy to test this:

Make sure to deploy this into a region that supports custom metrics, like West US 2.

Once you’ve deployed it, you can create some queues in the storage account (use the storage account that begins with q and not the one that begins with fn). Add some messages to the queues, and then run the function or wait for it to run automatically every five minutes.

Then you can check the metrics for the storage queue, making sure to change the metric namespace to queueprocessing:

You should see something like the following:

As of the time of writing (January 2019), there is a bug where Azure Storage custom metrics don’t display dimensions. This will hopefully be fixed soon.

Publishing Custom Application Metrics

Application Insights also allows for the publishing of custom metrics using its own SDK and APIs. These metrics can be queried through Azure Monitor in the same way as resource-level metrics. The process by which metrics are published into Application Insights is quite different to how Azure Monitor custom metrics are published, though.

The Microsoft documentation on Application Insights custom metrics is quite comprehensive, so rather than restate it here I will simply link to the important parts. I will focus on the C# SDK in this post.

To publish a custom metric to Application Insights you need an instance of the TelemetryClient class. In an Azure Functions app you can set the APPINSIGHTS_INSTRUMENTATIONKEY application setting – for example, within an ARM template – and then create an instance of TelemetryClient. The TelemetryClient will find the setting and will automatically configure itself to send telemetry to the correct place.

Once you have an instance of TelemetryClientyou can use the GetMetric().TrackValue() method to log a new metric value, which is then pre-aggregated and sent to Application Insights after a short delay. Dimensions can also be set using the same method. There are a number of overloads of this method that can be used to submit custom dimensions, too.

Note that as some features are in preview, they don’t work consistently yet – for example, at time of writing custom namespaces aren’t honoured correctly, but this should hopefully be resolved soon.

If you want to send raw metrics rather than pre-aggregated metrics, the legacy TrackMetric() method can be used, but Microsoft discourage its use and are deprecating it.

Here is some example Azure Function code that writes a random value to the My Test Metric metric:

And a full ARM template that deploys this is:

Summary

Custom metrics allow us to enrich our telemetry data with numerical values that can be aggregated and analysed, both manually through portal dashboards and APIs, and automatically using a variety of Azure Monitor features. We can publish custom metrics against any Azure resource by using the new custom metrics APIs, and we can also write application-level metrics to Application Insights.

In the next part of this series we will start to look at alerts, and will specifically look at metric alerts – one way to have Azure Monitor process the data for both built-in and custom metrics and alert us when things go awry.

Low-Cost Rate Limiting for Azure Functions APIs with API Management’s Consumption Tier

Azure Functions can be used as a lightweight platform for building APIs. They support a number of helpful features for API developers including custom routes and a variety of output bindings that can implement complex business rules. They also have a consumption-based pricing model, which provides a low-cost, pay-per-use pricing model while you have low levels of traffic, but can scale or burst for higher levels of demand.

The Azure Functions platform also provides Azure Functions Proxies, which gives another set of features to further extend APIs built on top of Azure Functions. These features include more complex routing rules and the ability to do a small amount of request rewriting. These features have led some people to compare Azure Functions Proxies to a very lightweight API management system. However, there are a number of features of an API management platform that Azure Functions Proxies doesn’t support. One common feature of an API management layer is the ability to perform rate limiting on incoming requests.

Azure API Management is a hosted API management service that provides a large number of features. Until recently, API Management’s pricing model was often prohibitive for small APIs, since using it for production workloads required provisioning a service instance with a minimum of about a AUD$200 monthly cost. But Microsoft recently announced a new consumption tier for API Management. Based on a similar pricing model to Azure Functions, the consumption tier for API Management bills per request, which makes it a far more appealing choice for serverless APIs. APIs can now use features like rate limiting – and many others – without needing to invest in a large monthly expense.

In this post I’ll describe how Azure Functions and the new API Management pricing tier can be used together to build a simple serverless API with rate limiting built in, and at a very low cost per transaction.

Note: this new tier is in preview, and so isn’t yet ready for production workloads – but it will hopefully be generally available and supported soon. In the meantime, it’s only available for previewing in a subset of Azure regions. For my testing I’ve been using Australia East.

Example Scenario

In this example, we’ll build a simple serverless API that would benefit from rate limiting. In our example function we simulate performing some business logic to calculate shipping rates for orders. Our hypothetical algorithm is very sophisticated, and so we may later want to monetise our API to make it available for high-volume users. In the meantime we want to allow our customers to try it out a little bit for free, but we want to put limits around their use.

There may be other situations where we need rate limiting too – for example, if we have a back-end system we call into that can only cope with a certain volume of requests, or that bills us when we use it.

First, let’s write a very simple function to simulate some custom business logic.

Function Code

For simplicity I’m going to write a C# script version of an Azure Function. You could easily change this to a precompiled function, or use any of the other languages that Azure Functions supports.

Our simulated function logic is as follows:

  • Receive an HTTP request with a body containing some shipping details.
  • Calculate the shipping cost.
  • Return the shipping cost.

In our simulation we’ll just make up a random value, but of course we may have much more sophisticated logic in future. We could also call into other back-end functions or APIs too.

Here’s our function code:

If we paste this code into the Azure Functions portal, we’ll be able to try it out, and sure enough we can get a result:

API Management Policy

Now that we’ve got our core API function working, the next step is to put an API Management gateway in front of it so we can apply our rate limiting logic. API Management works in terms of policies that are applied to incoming requests. When we work with the consumption tier of API Management we can make use of the policy engine, although there are some limitations. Even with these limitations, policies are very powerful and let us express and enforce a lot of complex rules. A full discussion of API Management’s policy system is beyond the scope of this post, but I recommend reviewing the policy documentation.

Here is a policy that we can use to perform our rate limiting:

This policy uses the caller’s IP address as the rate limit key. This means that if the same IP address makes three API calls within a 15-second period, it will get rate limited and told to try again later. Of course, we can adjust the lockout time, the number of calls allowed, and even the way that we group requests together when determining the rate limit.

Because we may have additional APIs in the future that would be subject to this rate limit, we’ll create an API Management product and apply the policy to that. This means that any APIs we add to that product will have this policy applied.

Securing the Connection

Of course, there’s not much point in putting an API Management layer in front of our function API if someone can simply go around it and call the function directly. There are a variety of ways of securing the connection between an API Management instance and a back-end Azure Functions app, including using function keys, function host keys, and Azure AD tokens. In other tiers of API Management you can also use the IP address of the API Management gateway, but in the consumption tier we don’t get any IP addresses to perform whitelisting on.

For this example we’ll use the function key for simplicity. (For a real production application I’d recommend using a different security model, though.) This means that we will effectively perform a key exchange:

  • Requests will arrive into the API Management service without any keys.
  • The API Management service will perform its rate limiting logic.
  • If this succeeds, the API Management service will call into the function and pass in the function key, which only it knows.

In this way, we’re treating the API Management service as a trusted subsystem – we’re configuring it with the credentials (i.e. the function key) necessary to call the back-end API. Azure API Management provides a configuration system to load secrets like this, but for simplicity we’ll just inject the key straight into a policy. Here’s the policy we’ve used:

We’ll inject the function key into the policy at the time we deploy the policy.

As this logic is specific to our API, we’ll apply this policy to the API and not to our product.

Deploying Through an ARM Template

We’ll use an ARM template to deploy this whole example. The template performs the following actions, approximately in this order:

  • Deploys the Azure Functions app.
  • Adds the shipping calculator function into the app using the deployment technique I discussed in a previous post.
  • Deploys an API Management instance using the consumption tier.
  • Creates an API in our API Management instance.
  • Configures an API operation to call into the shipping calculator function.
  • Adds a policy to the API operation to add the Azure Functions host key to the outbound request to the function.
  • Creates an API Management product for our shipping calculator.
  • Adds a rate limit policy to the product.

Here’s the ARM template:

There’s a lot going on here, and I recommend reading the API Management documentation for further detail on each of these. One important note is that whenever you interact with an API Management instance on the consumption tier using ARM templates, you must use API version 2018-06-01-preview or newer.

Calling our API

Now that we’ve deployed our API we can call it through our API Management gateway’s public hostname. In my case I used Postman to make some API calls. The first few calls succeeded:

But then after I hit the rate limit, as expected I got an error response back:

Trying again 13 seconds later, the request succeeded. So we can see our API Management instance is configured correctly and is performing rate limiting as we expected.

Summary

With the new consumption tier of Azure API Management, it’s now possible to have a low-cost set of API management features to deploy alongside your Azure Functions APIs. Of course, your APIs could be built on any technology, but if you are already in the serverless ecosystem then this is a great way to protect your APIs and back-ends. Plus, if your API grows to a point where you need more of the features of Azure API Management that aren’t provided in the consumption tier, or if you want to switch to a fixed-cost pricing model, you can always upgrade your API Management instance to one of the higher tiers. You can do this by simply modifying the sku property on the API Management resource within the ARM template.

Automating Azure Instrumentation and Monitoring – Part 2: Application Insights

Application Insights is a component of Azure Monitor for application-level instrumentation. It collects telemetry from your application infrastructure like web servers, App Services, and Azure Functions apps, and from your application code. In this post we’ll discuss how Application Insights can be automated in several key ways: first, by setting up an Application Insights instance in an ARM template; second, by connecting it to various types of Azure application components through automation scripts including Azure Functions, App Services, and API Management; and third, by configuring its smart detection features to emit automatic alerts in a configurable way. As this is the first time in this series that we’ll deploy instrumentation code, we’ll also discuss an approach that can be used to manage the deployment of different types and levels of monitoring into different environments.

This post is part of a series:

  • Part 1 provides an introduction to the series by describing why we should instrument our systems, outlines some of the major tools that Azure provides such as Azure Monitor, and argues why we should be adopting an ‘infrastructure as code’ mindset for our instrumentation and monitoring components.

  • Part 2 (this post) describes Azure Application Insights, including its proactive detection and alert features. It also outlines a pattern for deploying instrumentation components based on the requirements we might typically have for different environments, from short-lived development and test environments through to production.

  • Part 3 discusses how to publish custom metrics, both through Application Insights and to Azure Monitor. Custom metrics let us enrich the data that is available to our instrumentation components.

  • Part 4 covers the basics of alerts and metric alerts. Azure Monitor’s powerful alerting system is a big topic, and in this part we’ll discuss how it works overall, as well as how to get alerts for built-in and custom metrics.

  • Part 5 (coming soon) covers log alerts and resource health alerts, two other major types of alerts that Azure Monitor provides. Log alerts let us alert on information coming into Application Insights logs and Log Analytics workspaces, while resource health alerts us when Azure itself is having an issue that may result in downtime or degraded performance.

  • Part 6 (coming soon) describes dashboards. The Azure Portal has a great dashboard UI, and our instrumentation data can be made available as charts. Dashboards are also possible to automate, and I’ll show a few tips and tricks I’ve learned when doing this.

  • Part 7 (coming soon) covers availability tests, which let us proactively monitor our web applications for potential outages. We’ll discuss deploying and automating both single-step (ping) and multi-step availability tests.

  • Part 8 (coming soon) describes autoscale. While this isn’t exactly instrumentation in and of itself, autoscale is built on much of the same data used to drive alerts and dashboards, and autoscale rules can be automated as well.

  • Finally, part 9 (coming soon) covers exporting data to other systems. Azure Monitor metrics and log data can be automatically exported, as can Application Insights data, and the export rules can be exported and used from automation scripts.

Setting up Application Insights

When using the Azure Portal or Visual Studio to work with various types of resources, Application Insights will often be deployed automatically. This is useful when we’re exploring services or testing things out, but when it comes time to building a production-grade application, it’s better to have some control over the way that each of our components is deployed. Application Insights can be deployed using ARM templates, which is what we’ll do in this post.

Application Insights is a simple resource to create from an ARM template. An instance with the core functionality can be created with a small ARM template:

There are a few important things to note about this template:

  • Application Insights is only available in a subset of Azure regions. This means you may need to deploy it in a region other than the region your application infrastructure is located in. The template above includes a parameter to specify this explicitly.
  • The name of your Application Insights instance doesn’t have to be globally unique. Unlike resources like App Services and Cosmos DB accounts, there are no DNS names attached to an Application Insights instance, so you can use the same name across multiple instances if they’re in different resource groups.
  • Application Insights isn’t free. If you have a lot of data to ingest, you may incur large costs. You can use quotas to manage this if you’re worried about it.
  • There are more options available that we won’t cover here. This documentation page provides further detail.

After an Application Insights instance is deployed, it has an instrumentation key that can be used to send data to the correct Application Insights instance from your application. The instrumentation key can be accessed both through the portal and programmatically, including within ARM templates. We’ll use this when publishing telemetry.

Publishing Telemetry

There are a number of ways to publish telemetry into Application Insights. While I won’t cover them all, I’ll give a quick overview of some of the most common ways to get data from your application into Application Insights, and how to automate each.

Azure Functions

Azure Functions has built-in integration with Application Insights. If you create an Azure Functions app through the portal it asks whether you want to set up this integration. Of course, since we’re using automation scripts, we have to do a little work ourselves. The magic lies in an app setting, APPINSIGHTS_INSTRUMENTATIONKEY, which we can attach to any function app. Here is an ARM template that deploys an Azure Functions app (and associated app service plan and storage account), an Application Insights instance, and the configuration to link the two:

App Services

If we’re deploying a web app into an app service using ASP.NET, we can use the ASP.NET integration directly. In fact, this works across many different hosting environments, and is described in the next section.

If you’ve got an app service that is not using ASP.NET, though, you can still get telemetry from your web app into Application Insights by using an app service extension. Extensions augment the built-in behaviour of an app service by installing some extra pieces of logic into the web server. Application Insights has one such extension, and we can configure it using an ARM template:

As you can see, similarly to the Azure Functions example, the APPINSIGHTS_INSTRUMENTATIONKEY is used here to link the app service with the Application Insights instance.

One word of warning – I’ve found that the site extension ARM resource isn’t always deployed correctly the first time the template is deployed. If you get an error the first time you deploy, try it again and see if the problem goes away. I’ve tried, but have never fully been able to understand why this happens or how to stop it.

ASP.NET Applications

If you have an ASP.NET application running in an App Service or elsewhere, you can install a NuGet package into your project to collect Application Insights telemetry. This process is documented here. If you do this, you don’t need to install the App Services extension from the previous section. Make sure to set the instrumentation key in your configuration settings and then flow it through to Application Insights from your application code.

API Management

If you have an Azure API Management instance, you might be aware that this can publish telemetry into Application Insights too. This allows for monitoring of requests all the way through the request handling pipeline. When it comes to automation, Azure API Management has very good support for ARM templates, and its Application Insights integration is no exception.

At a high level there are two things we need to do: first, we create a logger resource to establish the API Management-wide connection with Application Insights; and second, we create a diagnosticresource to instruct our APIs to send telemetry to the Application Insights instance we have configured. We can create a diagnostic resource for a specific API or to cover all APIs.

The diagnostic resource includes a sampling rate, which is the percentage of requests that should have their telemetry sent to Application Insights. There is a lot of detail to be aware of with this feature, such as the performance impact and the ways in which sampling can reduce that impact. We won’t get into that here, but I encourage you to read more detail from Microsoft’s documentationbefore using this feature.

Here’s an example ARM template that deploys an API Management instance, an Application Insights instance, and configuration to send telemetry from every request into Application Insights:

Smart Detection

Application Insights provides a useful feature called smart detection. Application Insights watches your telemetry as it comes in, and if it notices unusual changes, it can send an alert to notify you. For example, it can detect the following types of issues:

  • An application suddenly sends back a higher rate of 5xx (error)-class status responses than it was sending previously.
  • The time it takes for an application to communicate with a database has increased significantly above the previous average.

Of course, this feature is not foolproof – for example, in my experience it won’t detect slow changes in error rates over time that may still indicate an issue. Nevertheless, it is a very useful feature to have available to us, and it has helped me identify problems on numerous occasions.

Smart detection is enabled by default. Unless you configure it otherwise, smart detection alerts are sent to all owners of the Azure subscription in which the Application Insights instance is located. In many situations this is not desirable: when your Azure subscription contains many different applications, each with different owners; or when the operations or development team are not granted the subscription owner role (as they should not be!); or when the subscriptions are managed by a central subscription management team who cannot possibly deal with the alerts they receive from all applications. We can configure each smart detection alert using the proactiveDetectionConfigs ARM resource type.

Here is an example ARM template showing how the smart detection alerts can be redirected to an email address you specify:

In development environments, you may not want to have these alerts enabled at all. Development environments can be used sporadically, and can have a much higher error rate than normal, so the signals that Application Insights uses to proactively monitor for problems aren’t as useful. I find that it’s best to configure smart detection myself so that I can switch it on or off for different environments, and for those environments that do need it, I’ll override the alert configuration to send to my own alert email address and not to the subscription owners. This requires us to have different instrumentation configuration for different environments.

Instrumentation Environments

In most real-world applications, we end up deploying the application in at least three environments: development environments, which are used by software developers as they actively work on a feature or change; non-production environments, which are used by testers, QA engineers, product managers, and others who need to access a copy of the application before it goes live; and production environments, which are used by customers and may be monitored by a central operations team. Within these categories, there can be multiple actual environments too – for example, there can be different non-production environments for different types of testing (e.g. functional testing, security testing, and performance testing), and some of these may be long-lived while others are short-lived.

Each of these different environments also has different needs for instrumentation:

  • Production environments typically need the highest level of alerting and monitoring since an issue may affect our customers’ experiences. We’ll typically have many alerts and dashboards set up for production systems. But we also may not want to collect large volumes of telemetry from production systems, especially if doing so may cause a negative impact on our application’s performance.
  • Non-production environments may need some level of alerting, but there are certain types of alerts that may not make sense compared to production environments. For example, we may run our non-production systems on a lower tier of infrastructure compared to our production systems, and so an alert based on the application’s response time may need different thresholds to account for the lower expected performance. But in contrast to non-production environments, we may consider it to be important to collect a lot of telemetry in case our testers do find any issues and we need to diagnose them interactively, so we may allow for higher levels of telemetry sampling than we would in a production environment.
  • Development environments may only need minimal instrumentation. Typically in development environments I’ll deploy all of the telemetry collection that I would deploy for non-production environments, but turn all alerts and dashboards off. In the event of any issues, I’ll be interactively working with the telemetry myself anyway.

Of course, your specific needs may be different, but in general I think it’s good to categorise our instrumentation across types of environments. For example, here is how I might typically deploy Application Insights components across environments:

Instrumentation TypeDevelopmentNonProductionProduction
Application Insights smart detectionOffOn, sending alerts to developersOn, sending alerts to production monitoring group
Application Insights Azure Functions integrationOnOnOn
Application Insights App Services integrationOnOnOn
Application Insights API Management integrationOn, at 100% samplingOn, at 100% samplingOn, at 30% sampling

Once we’ve determined those basic rules, we can then implement them. In the case of ARM templates, I tend to use ARM template parameters to handle this. As we go through this series we’ll see examples of how we can use parameters to achieve this conditional logic. I’ll also present versions of this table with my suggestions for the components that you might consider deploying for each environment.

Configuring Smart Detection through ARM Templates

Now that we have a basic idea of how we’ll configure instrumentation in each environment, we can reconsider how we might configure Application Insights. Typically I suggest deploying a single Application Insights instance for each environment the system will be deployed into. If we’re building up a complex ARM template with all of the system’s components, we can embed the conditional logic required to handle different environments in there.

Here’s a large ARM template that includes everything we’ve created in this post, and has the three environment type modes:

Summary

Application Insights is a very useful tool for monitoring our application components. It collects telemetry from a range of different sources, which can all be automated. It provides automatic analysis of some of our data and has smart detection features, which again we can configure through our automation scripts. Furthermore, we can publish data into it ourselves as well. In fact, in the next post this series, we’ll discuss how we can publish custom metrics into Application Insights.

Automating Azure Instrumentation and Monitoring – Part 1: Introduction

Instrumentation and monitoring is a critical part of managing any application or system. By proactively monitoring the health of the system as a whole, as well as each of its components, we can mitigate potential issues before they affect customers. And if issues do occur, good instrumentation alerts us to that fact so that we can respond quickly.

Azure provides a set of powerful monitoring and instrumentation tools to instrument almost all Azure services as well as our own applications. By taking advantage of these tools we can can improve the quality of our systems. However, there isn’t a lot of documentation on how to script and automate the instrumentation components that we build. Alerts, dashboards, and other instrumentation components are important parts of our systems and deserve as much attention as our application code or other parts of our infrastructure. In this series, we’ll cover many of the common types of instrumentation used in Azure-hosted systems and will outline how many of these can be automated, usually with a combination of ARM templates and scripting. The series consists of nine parts:

  • Part 1 (this post) provides an introduction to the series by describing why we should instrument our systems, outlines some of the major tools that Azure provides such as Azure Monitor, and argues why we should be adopting an ‘infrastructure as code’ mindset for our instrumentation and monitoring components.

  • Part 2 describes Azure Application Insights, including its proactive detection and alert features. It also outlines a pattern for deploying instrumentation components based on the requirements we might typically have for different environments, from short-lived development and test environments through to production.

  • Part 3 discusses how to publish custom metrics, both through Application Insights and to Azure Monitor. Custom metrics let us enrich the data that is available to our instrumentation components.

  • Part 4 covers the basics of alerts and metric alerts. Azure Monitor’s powerful alerting system is a big topic, and in this part we’ll discuss how it works overall, as well as how to get alerts for built-in and custom metrics.

  • Part 5 (coming soon) covers log alerts and resource health alerts, two other major types of alerts that Azure Monitor provides. Log alerts let us alert on information coming into Application Insights logs and Log Analytics workspaces, while resource health alerts us when Azure itself is having an issue that may result in downtime or degraded performance.

  • Part 6 (coming soon) describes dashboards. The Azure Portal has a great dashboard UI, and our instrumentation data can be made available as charts. Dashboards are also possible to automate, and I’ll show a few tips and tricks I’ve learned when doing this.

  • Part 7 (coming soon) covers availability tests, which let us proactively monitor our web applications for potential outages. We’ll discuss deploying and automating both single-step (ping) and multi-step availability tests.

  • Part 8 (coming soon) describes autoscale. While this isn’t exactly instrumentation in and of itself, autoscale is built on much of the same data used to drive alerts and dashboards, and autoscale rules can be automated as well.

  • Finally, part 9 (coming soon) covers exporting data to other systems. Azure Monitor metrics and log data can be automatically exported, as can Application Insights data, and the export rules can be exported and used from automation scripts.

While the posts will cover the basics of each of these topics, the focus will be on deploying and automating each of these components. I’ll provide links to more details on the inner workings where needed to supplement the basic overview I’ll provide. Also, I’ll assume some basic familiarity with ARM templates and PowerShell.

Let’s start by reviewing the landscape of instrumentation on Azure.

Azure’s Instrumentation Platform

As Azure has evolved, it’s built up an increasingly comprehensive suite of tools for monitoring the individual components of a system as well as complete systems as a whole. The key piece of the Azure monitoring puzzle is named, appropriately enough, Azure Monitor. Azure Monitor is a built-in service that works with almost all Azure services. Many of its features are free. It automatically captures telemetry, consolidates it, and makes the data available for interactive querying as well as for a variety of other purposes that we’ll discuss throughout the series.

This isn’t quite the whole story, though. While Azure Monitor works well most of the time, and it appears to be the strategic direction that Azure is heading in, there are a number of exceptions, caveats, and complexities – and these become more evident when you try to automate it. I’ll cover some of these in more detail below.

Metrics

Metrics are numeric values that represent a distinct piece of information about a component at a point in time. The exact list of metrics depends on what makes sense for a given service. For example, a virtual machine publishes metrics for the CPU and memory used; a SQL database has metrics for the number of connections and the database throughput units used; a Cosmos DB account publishes metrics for the number of requests issued to the database engine; and an App Service has metrics for the number of requests flowing through. There can be dozens of different metrics published for any given Azure service, and they are all documented for reference. We’ll discuss metrics in more detail throughout the series, as there are some important things to be aware of when dealing with metrics.

As well as Azure Monitor’s metrics support, some Azure services have their metrics systems. For example, SQL Azure has a large amount of telemetry that can be accessed through dynamic management views. Some of the key metrics are also published into Azure Monitor, but if you want to use metrics that are only available in dynamic management views then you won’t be able to use the analysis and processing features of Azure Monitor. We’ll discuss a potential workaround for this in part 3 of this series.

A similar example is Azure Storage queues. Azure Storage has an API that can be used to retrieve the approximate number of messages sitting in a queue, but this metric isn’t published into Azure Monitor and so isn’t available for alerting or dashboarding. Again, we’ll discuss a potential workaround for this in part 3 of this series.

Nevertheless, in my experience, almost all of the metrics I work with on a regular basis are published through Azure Monitor, and so in this series we’ll predominantly focus on these.

Logs

Logs are structured pieces of data, usually with a category, a level, and a textual message, and often with a lot of additional contextual data as well. Broadly speaking, there are several general types of logs that Azure deals with:

  • Resource activity logs are essentially the logs for management operations performed on Azure resources through the Azure Resource Management (ARM) API, and a few other types of management-related logs. They can be interactively queried using the Azure Portal blades for any resource, as well as resource groups and subscriptions. You can typically view these by looking at the Activity log tab from any Azure resource blade in the portal. Activity logs contain all write operations that pass through the ARM API. If you use the ARM API directly, or indirectly through the Azure Portal, CLI, PowerShell, or anything else, you’ll see logs appear in here. More details on activity logs is available here.
  • Azure AD activity logs track Active Directory sign-ins and management actions. These can be viewed from within the Azure AD portal blade. We won’t be covering Azure AD much in this series, but you can read more detail about Azure AD logs here.
  • Diagnostic logs are published by individual Azure services. They provide information about the actions and work that the service itself is doing. By default these are not usually available for interactive querying. Diagnostic logs often work quite differently between different services. For example, Azure Storage can publish its own internal logs into a $logs blob container; App Services provides web server and application logs and can save these to a number of different places as well as view them in real time; and Azure SQL logs provide a lot of optional diagnostic information and again have to be explicitly enabled.
  • Application logs are written by application developers. These can be sent to a number of different places, but a common destination is Application Insights. If logs are published into Application Insights they can be queried interactively, and used as part of alerts and dashboards. We’ll discuss these in more detail in later parts of this series.

Azure Log Analytics is a central log consolidation, aggregation, and querying service. Some of the above logs are published automatically into Log Analytics, while others have to be configured to do so. Log Analytics isn’t a free service, and needs to be provisioned separately if you want to configure logs to be sent into it. We’ll discuss it more detail throughout this series.

Ingestion of Telemetry

Azure services automatically publish metrics into Azure Monitor, and these built-in metrics are ingested free of charge. Custom metrics can also be ingested by Azure Monitor, which we’ll discuss in more detail in part 3 of this series.

As described in the previous section, different types of logs are ingested in different ways. Azure Monitor automatically ingests resource activity logs, and does so free of charge. The other types of logs are not ingested by Azure Monitor unless you explicitly opt into that, either by configuring Application Insights to receive custom logs, or by provisioning a Log Analytics workspace and then configuring your various components to send their logs to that.

Processing and Working With Telemetry

Once data has been ingested into Azure Monitor, it becomes available for a variety of different purposes. Many of these will be discussed in later parts of this series. For example, metrics can be used for dashboards (see part 6, coming soon) and for autoscale rules (see part 8, coming soon); logs that have been routed to Azure Monitor can be used as part of alerts (see part 5, coming soon); and all of the data can be exported (see part 9, coming soon).

Application Insights

Application Insights has been part of the Azure platform for around two years. Microsoft recently announced that it is considered to be part of the umbrella Azure Monitor service. However, Application Insights is deployed as a separate service, and is billable based on the amount of data it ingests. We’ll cover Application Insights in more detail in part 2 of this series.

Summary of Instrumentation Components

There’s a lot to take in here! The instrumentation story across Azure isn’t always easy to understand, and although the complexity is reducing as Microsoft consolidates more and more of these services into Azure Monitor, there is still a lot to unpack. Here’s a very brief summary:

Azure Monitor is the primary instrumentation service we generally interact with. Azure Monitor captures metrics from every Azure service, and it also captures some types of logs as well. More detailed diagnostic and activity logging can be enabled on a per-service or per-application basis, and depending on how you configure it, it may be routed to Azure Monitor or somewhere else like an Azure Storage account.

Custom data can be published into Azure Monitor through custom metrics (which we’ll cover in part 3 of the series), through publishing custom logs into Log Analytics, and through Application Insights. Application Insights is a component that is deployed separately, and provides even more metrics and logging capabilities. It’s built off the same infrastructure as the rest of Azure Monitor and is mostly queryable from the same places.

Once telemetry is published into Azure Monitor it’s available for a range of different purposes including interactive querying, alerting, dashboarding, and exporting. We’ll cover all of these in more detail throughout the series.

Instrumentation as Infrastructure

The idea of automating all of our infrastructure – scripting the setup of virtual machines or App Services, creating databases, applying schema updates, deploying our applications, and so forth – has become fairly uncontroversial. The benefits are so compelling, and the tools are getting so good, that generally most teams don’t take much convincing that expressing their infrastructure as code is worthwhile. But in my experience working with a variety of customers, I’ve found that this often isn’t the case with instrumentation.

Instrumentation components like dashboards, alerts, and availability tests are still frequently seen as being of a different category to the rest of an application. While it may seem perfectly reasonable to script out the creation of some compute resources, and for these scripts to be put into a version control system and built alongside the app itself, instrumentation is frequently handled manually and without the same level of automation rigour as the application code and scripts. As I’ll describe below, I’m not opposed to using the Azure Portal and other similar tools to explore the metrics and logs associated with an application. But I believe that the instrumentation artifacts that come out of this exploration – saved queries, dashboard widgets, alert rules, etc – are just as important as the rest of our application components, and should be treated with the same level of diligence.

As with any other type of infrastructure, there are some clear benefits to expressing instrumentation components as code compared to using the Azure Portal including:

  • Reducing risk of accidental mistakes: I find that expressing my instrumentation logic explicitly in code, scripts, or ARM templates makes me far less likely to make a typo, or to do something silly like confuse different units of measurement when I’m setting an alert threshold.
  • Peer review: For teams that use a peer review process in their version control system, treating infrastructure as code means that someone else on the team is expected to review the changes I’m making. If I do end up making a dumb mistake then it’s almost always caught by a coworker during a review, and and even if there are no mistakes, having someone else on the team review the change means that someone else understands what’s going on.
  • Version control: Keeping all of our instrumentation logic and alert rules in a version control system is helpful when we want to understand how instrumentation has evolved over time, and for auditability.
  • Keeping related changes together: I’m a big fan of keeping related changes together. For example, if I create a pull request to add a new application component then I can add the application code, the deployment logic, and the instrumentation for that new component all together. This makes it easier to understand the end-to-end scope of the feature being added. If we include instrumentation in our ‘definition of done’ for a feature then we can easily see that this requirement is met during the code review stage.
  • Managing multiple environments: When instrumentation rules and components aren’t automated, it’s easy for them to get out of sync between environments. In most applications there is at least one dev/test environment as well as production. While it might seem unnecessary to have alerts and monitoring in a dev environment, I will argue in part 2 of this series that it’s important to do so, even if you have slightly different rules and thresholds. Deploying instrumentation as code means that these environments can be kept in sync. Similarly, you may deploy your production environment to multiple regions for georedundancy or for performance reasons. If your instrumentation components are kept alongside the rest of your infrastructure, you’ll get the same alerts and monitoring for all of your regions.
  • Avoid partial automation: In my experience, partially automating an application can sometimes result in more complexity than not automating it at all. For example, if you use ARM templates and (as I typically suggest) use the ‘complete’ deployment mode, then any components you may have created manually through the Azure Portal can be removed. Many of the instrumentation components we’ll discuss are ARM resources and so can be subject to this behaviour. Therefore, a lack of consistency across how we deploy all of our infrastructure and instrumentation can result in lost work, missed alerts, hard-to-find bugs, and generally odd instrumentation behaviour.

Using the Azure Portal

Having an instrumentation-first mindset doesn’t mean that we can’t or shouldn’t ever use the Azure Portal. In fact, I tend to use it quite a lot – but for specific purposes.

First, I tend to use it a lot for interactively querying metrics and logs in response to an issue, or just to understand how my systems are behaving. I’ll use Metrics Explorer to create and view charts of potentially interesting metrics, and I’ll write log queries and execute them from Application Insights or Log Analytics.

Second, when I’m responding to alerts, I’ll make use of the portal’s tooling to view details, track the status of the alert, and investigate what might be happening. We’ll discuss alerts more later in this series.

Third, I use the portal for monitoring my dashboards. We’ll talk about dashboards in part 6 (coming soon). Once they’re created, I’ll often check on them to make sure that all of my metrics look to be in a normal range and that everything appears healthy.

Fourth, when I’m developing new alerts, dashboard widgets, or other components, I’ll create test resources using the portal. I’lll use my existing automation scripts to deploy a short-term copy of my environment temporarily, then deploy a new alert or autoscale rule using the portal, and then export them to an ARM template or manually construct a template based on what gets deployed by the portal. This way I can see how things should work, and get to use the portal’s built-in validation and assistance with creating the components, but still get everything into code form eventually. Many of the ARM templates I’ll provide throughout this series were created in this way.

Finally, during an emergency – when a system is down, or something goes wrong in the middle of the night – I’ll sometimes drop the automation-first requirement and create alerts on the fly, even on production, but knowing that I’ll need to make sure I add it into the automation scripts as soon as possible to ensure everything stays in sync.

Summary

This post has outlined the basics of Azure’s instrumentation platform. The two main types of data we tend to work with are metrics and logs. Metrics are numerical values that represent the state of a system at a particular point in time. Logs come in several variants, some of which are published automatically and some of which need to be enabled and then published to a suitable location before they can be queried. Both metrics and logs can be processed by Azure Monitor, and over the course of this series we’ll look at how we can script and automate the ingestion, processing, and handling of a variety of types of instrumentation data.

Automation of Azure Monitor and other instrumentation components is something that I’ve found to be quite poorly documented, so in writing this series I’ve aimed to provide both explanations of how these parts can be built, and set of sample ARM templates and scripts that you can adapt to your own environment.

In the next part we’ll discuss Application Insights, and some of the automation we can achieve with that. We’ll also look at a pattern I typically use for deploying different levels of instrumentation into different environments.

Creating Azure Storage SAS Tokens with ARM Templates

Shared access signatures, sometimes also called SAS tokens, allow for delegating access to a designated part of an Azure resource with a defined set of permissions. They can be used to allow various types of access to your Azure services while keeping your access keys secret.

In a recent update to Azure Resource Manager, Microsoft has added the ability to create SAS tokens from ARM templates. While this is a general-purpose feature that will hopefully work across a multitude of Azure services, for now it only seems to work with Azure Storage (at least of the services I’ve checked). In this post I’ll explain why this is useful, and give some example ARM templates that illustrate creating both account and service SAS tokens.

Use Cases

There are a few situations where it’s helpful to be able to create SAS tokens for an Azure Storage account from an ARM template. One example is when using the Run From Package feature – an ARM template can deploy an App Service, deploy a storage account, and create a SAS token for a package blob – even if it doesn’t exist at deployment time.

Another example might be for an Azure Functions app. A very common scenario is for a function to receive a file as input, transform it in some way, and save it to blob storage. Rather than using the root access keys for the storage account, we could create a SAS token and add it to the Azure Functions app’s config settings, like in this example ARM template.

How It Works

ARM templates now support a new set of functions for generating SAS tokens. For Azure Storage, there are two types of SAS tokens – account and service – and the listAccountSas and listServiceSas functions correspond to these, respectively.

Behind the scenes, these functions invoke the corresponding functions on the Azure Storage ARM provider API. Somewhat confusingly, even though the functions’ names start with list, they are actually creating SAS tokens and not listing or working with previously created tokens. (In fact, SAS tokens are not resources that Azure can track, so there’s nothing to list.)

Example

Int his post I’ll show ARM templates that use a variety of types of SAS for different types of situations. For simplicity I’ve used the outputs section to emit the SASs, but these could easily be passed through to other parts of the template such as App Service app settings (as in this example). Also, in this post I’ve only dealt with blob storage SAS tokens, but the same process can easily be used for queues and tables, and even let us restrict token holders to only access to table partitions or sets of rows.

Creating a Service SAS

By using a service SAS, we can grant permissions to a specific blob in blob storage, or to all blobs within a container. Similarly we can grant permissions to an individual queue, or to a subset of entities within a table. More documentation on constructing service SASs is available hereand the details of the listServiceSas function is here.

Service SASs require us to provide a canonicalizedResource, which is just a way of describing the scope of the token. We can use the path to an individual blob by using the form /blob/{accountName}/{containerName}/{path}, such as /blob/mystorageaccountname/images/cat.jpeg or /blob/mystorageaccountname/images/cats/fluffy.jpeg. Or, we can use the path to a container by using the form /blob/mystorageaccountname/images. The examples below show different types of canonicalizedResource property values.

Read Any Blob Within a Container

Our first SAS token will let us read all of the blobs within a given container. For this, we’ll need to provide four properties.

First, we’ll provide a canonicalizedResource. This will be the path to the container we want to allow the holder of the token to read from. We’ll construct this field dynamically based on the ARM template input.

Second, we need to provide a signedResource. This is the type of resource that the token is scoped to. In this case, we’re creating a token that works across a whole blob container and so we’ll use the value c.

Third, we provide the signedPermission. This is the permission, or set of permissions, that the token will allow. There are different permissions for reading, creating, and deleting blobs and other storage entities. Importantly, listing is considered to be separate to reading, so bear this in mind too. Because we just want to allow reading blobs, we’ll use the value r.

Finally, we have to provide an expiry date for the token. I’ve set this to January 1, 2050.

When we execute the ARM template with the listServiceSas function, we need to provide these values as an object. Here’s what our object looks like:


"serviceSasFunctionValues": {
  "canonicalizedResource": "[concat('/blob/', parameters('storageAccountName'), '/', parameters('containerName'))]",
  "signedResource": "c",
  "signedPermission": "r",
  "signedExpiry": "2050-01-01T00:00:00Z"
}

And here’s the ARM template – check out line 62, where the listServiceSas function is actually invoked.

Read A Single Blob

In many cases we will want to issues SAS tokens to only read a single blob, and not all blobs within a container. In this case we make a couple of changes from the first example.

First, our canonicalizedResource now will have the path to the individual blob, not to a container. Again, we’ll pull these from the ARM template parameters.

Second, the signedResource is now a blob rather than a container, so we use the value b instead of c.

Here’s our properties object for this SAS token:


"serviceSasFunctionValues": {
  "canonicalizedResource": "[concat('/blob/', parameters('storageAccountName'), '/', parameters('containerName'), parameters('blobName'))]",
  "signedResource": "b",
  "signedPermission": "r",
  "signedExpiry": "2050-01-01T00:00:00Z"
}

And here’s the full ARM template:

Write A New Blob

SAS tokens aren’t just for reading, of course. We can also create a token that will allow creating or writing to blobs. One common use case is to allow the holder of a SAS to create a new blob, but not to overwrite anything that already exists. To create a SAS for this scenario, we work back at the container level again – so the canonicalizedResource property is set to the path to the container, and the signedResource is set to c. This time, we set signedPermission to c to allow for blobs to be created. (If we wanted to also allow overwriting blobs, we could do this by setting signedPermission to cw.)

Here’s our properties object:


"serviceSasFunctionValues": {
  "canonicalizedResource": "[concat('/blob/', parameters('storageAccountName'), '/', parameters('containerName'))]",
  "signedResource": "c",
  "signedPermission": "c",
  "signedExpiry": "2050-01-01T00:00:00Z"
}

And here’s an ARM template:

Creating an Account SAS

An account SAS works at the level of a storage account, rather than at the item-level like a service SAS. For the majority of situations you will probably want a service SAS, but account SASs can be used for situations where you want to allow access to all blobs within a storage account, or if you want to allow for the management of blob containers, tables, and queues. More detail on what an account SAS can do is available here.

More documentation on constructing account SASs is available hereand the details of the listAccountSas function is here.

Read All Blobs in Account

We can use an account SAS to let us read all of the blobs within a storage account, regardless of the container they’re in. For this token we need to use signedServices = b to indicate that we’re granting permissions within blob storage, and we’ll use signedPermission = r to let us read blobs.

The signedResourceTypes parameter is available on account SAS tokens but not on service SAS tokens, and it lets us specify the set of APIs that can be used. We’ll use o here since we want to read all blobs, and blobs are considered to be objects, and reading blobs would involve object-level API calls. There are also two other values for this parameter – s indicates service-level APIs (such as creating new blob containers), and c indicates container-level APIs (such as deleting a blob container that already exists). You can see the APIs available within each category in the Azure Storage documentation.

So our SAS token will be generated with the following properties:


"accountSasFunctionValues": {
  "signedServices": "b",
  "signedPermission": "r",
  "signedResourceTypes": "o",
  "signedExpiry": "2050-01-01T00:00:00Z"
}

Here’s the full ARM template that creates this SAS:

List Blob Containers in Account

Finally, let’s create a SAS token that will allow us to list the blob containers within our account. For this token we need to use signedServices = b again, but this time we’ll use signedPermission = l (since l indicates permission to list).

Somewhat non-intuitively, the API that lists containers is considered a service-level API, so we need to use signedResourceTypes = s.

This means the parameters we’re using to generate a SAS token are as follows:

"accountSasFunctionValues": {
  "signedServices": "b",
  "signedPermission": "l",
  "signedResourceTypes": "s",
  "signedExpiry": "2050-01-01T00:00:00Z"
}

Here’s the full ARM template that creates this SAS:

You can test this by executing a GET request against this URL: https://{yourStorageAccountName}.blob.core.windows.net/?comp=list&{sasToken}.

Other Services

Currently it doesn’t appear that other services that use SAS tokens support this feature. For example, Service Bus namespacesEvent Hubs namespaces, and Cosmos DB database accounts don’t support any list* operations on their resource provider APIs that would allow for creating SAS tokens. Hopefully these will come soon, since this feature is very powerful and will allow for ARM templates to more self-contained.

Also, I have noticed that some services don’t like the listAccountSas function being embedded in their resource definitions. For example, Azure Scheduler (admittedly soon to be retired) seems to have a bug where it doesn’t like SAS tokens generated in this way for scheduler jobs. However, I have used this feature in other situations without any such issues.

Deploying Azure Functions with ARM Templates

There are many different ways in which an Azure Function can be deployed. In a future blog post I plan to go through the whole list. There is one deployment method that isn’t commonly known though, and it’s of particular interest to those of us who use ARM templates to deploy our Azure infrastructure. Before I describe it, I’ll quickly recap ARM templates.

ARM Templates

Azure Resource Manager (ARM) templates are JSON files that describe the state of a resource group. They typically declare the full set of resources that need to be provisioned or updated. ARM templates are idempotent, so a common pattern is to run the template deployment regularly—often as part of a continuous deployment process—which will ensure that the resource group stays in sync with the description within the template.

In general, the role of ARM templates is typically to deploy the infrastructure required for an application, while the deployment of the actual application logic happens separately. However, Azure Functions’ ARM integration has a feature whereby an ARM template can be used to deploy the files required to make the function run.

How to Deploy Functions in an ARM Template

In order to deploy a function through an ARM template, we need to declare a resource of type Microsoft.Web/sites/functions, like this:

There are two important parts to this.

First, the config property is essentially the contents of the function.json file. It includes the list of bindings for the function, and in the example above it also includes the disabled property.

Second, the files property is an object that contains key-value pairs representing each file to deploy. The key represents the filename, and the value represents the full contents of the file. This only really works for text files, so this deployment method is probably not the right choice for precompiled functions and other binary files. Also, the file needs to be inlined within the template, which may quickly get unwieldy for larger function files—and even for smaller files, the file needs to be escaped as a JSON string. This can be done using an online tool like this, or you could use a script to do the escaping and pass the file contents as a parameter into the template deployment.

Importantly, in my testing I found that using this method to deploy over an existing function will remove any files that are not declared in the files list, so be careful when testing this approach if you’ve modified the function or added any files through the portal or elsewhere.

Examples

There are many different ways you can insert your function file into the template, but one of the ways I tend to use is a PowerShell script. Inside the script, we can read the contents of the file into a string, and create a HashTable for the ARM template deployment parameters:

Then we can use the New-AzureRmResourceGroupDeployment cmdlet to execute the deployment, passing in $templateParameters to the -TemplateParameterObject argument.

You can see the full example here.

Of course, if you have a function that doesn’t change often then you could instead manually convert the file into a JSON-encoded string using a tool like this one, and paste the function right into the ARM template. To see a full example of how this can be used, check out this example ARM template from a previous blog article I wrote.

When to Use It

Deploying a function through an ARM template can make sense when you have a very simple function that is comprised of one, or just a few, files to be deployed. In particular, if you already deploy the function app itself through the ARM template then this might be a natural extension of what you’re doing.

This type of deployment can also make sense if you’re wanting to quickly deploy and test a function and don’t need some of the more complex deployment-related features like control over handling locked files. It’s also a useful technique to have available for situations where a full deployment script might be too heavyweight.

However, for precompiled functions, functions that have binary files, and for complex deployments, it’s probably better to use another deployment mechanism. Nevertheless, I think it’s useful to know that this is a tool in your Azure Functions toolbox.

Deploying Blob Containers with ARM Templates

ARM templates are a great way to programmatically deploy your Azure resources. They act as declarative descriptions of the desired state of an Azure resource group, and while they can be frustrating to work with, overall the ability to use templates to deploy your Azure resources provides a lot of value.

One common frustration with ARM templates is that certain resource types simply can’t be deployed with them. Until recently, one such resource type was a blob container. ARM templates could deploy Azure Storage accounts, but not blob containers, queues, or tables within them.

That has now changed, and it’s possible to deploy a blob container through an ARM template. Here’s an example template that deploys a container called logs within a storage account:

Queues and tables still can’t be deployed this way, though – hopefully that’s coming soon.

Key Vault Secrets and ARM Templates

What is Azure Key Vault

Azure Key Vault helps safeguard cryptographic keys and secrets used by cloud applications and services. By using Key Vault, you can encrypt keys and secrets (such as authentication keys, storage account keys, data encryption keys, .PFX files, and passwords) using keys protected by hardware security modules (HSMs).
Key Vault streamlines the key management process and enables you to maintain control of keys that access and encrypt your data. Developers can create keys for development and testing in minutes, and then seamlessly migrate them to production keys. Security administrators can grant (and revoke) permission to keys, as needed.
Anybody with an Azure subscription can create and use key vaults. Although Key Vault benefits developers and security administrators, it could be implemented and managed by an organization’s administrator who manages other Azure services for an organization. For example, this administrator would sign in with an Azure subscription, create a vault for the organization in which to store keys, and then be responsible for operational tasks, such as:

  • Create or import a key or secret
  • Revoke or delete a key or secret
  • Authorize users or applications to access the key vault, so they can then manage or use its keys and secrets
  • Configure key usage (for example, sign or encrypt)
  • Monitor key usage

This administrator would then provide developers with URIs to call from their applications, and provide their security administrator with key usage logging information.
( Ref: https://docs.microsoft.com/en-us/azure/key-vault/key-vault-whatis ).
 

Current Scenario of the Key Vault.

In the current scenario , we are utilizing Key Vault for the provisioning of arm resources. Instead of any username and passwords , the template only contains references to these values stored in Azure Key Vault.
Theses secrets are extracted while the resources are being deployed using the template and the parameter file together.

Utilizing the Key Vault.

The following tasks are involved in utilizing the Key Vault

  • Creating the key Vault.
  • Adding Keys and Secrets in the Vault
  • Securing the Key Vault
  • Referencing Keys

Creating a Key Vault

Step 1 : Login to Azure Portal and click on All Services  and select Key Vault
1.png
Step 2 : Click on Add and enter the following details and click on Create

  • Key Vault Name
  • Subscription
  • Resource Group
  • Pricing Tier

2.PNG
Step 3 : Select the Key Vault Name
Step 4 : Select Secrets
Step 5 : Click on Generate/Import
3
Step 6: Select Manual in upload options
Step 7: Enter the following information.

  • Name of the Secret ( Like MyPassword )
  • Value of the Secret ( example P@ssword1)
  • Set Activation Date ( If required )
  • Set Expiration Date ( If required)

Step 8 : Select Yes on the Enabled option
Step 9: Click on Create
4

Securing the Key Vault

Step 1: Select the Newly created Key Vault
Step 2:  Select Access Policies
Step 3 : Select click to show advanced access policies
5
Step 4: Select the checkboxes as shown in the snapshot below.
Step 5: Click on Add New
6
Step 6: Select Secret management in the configure from template option.
Step 7:  Select the Principal ( name of the resource which needs access to the secret.)
Step 8: Select the permissions required from the Secret Permissions
Step 9: Select OK.
8
 

Referencing the Secrets.

Currently we are referencing the secrets stored in the Key vault using the arm templates.
A parameter type “securestring” is created in the parameter section of the arm template armtemplate.json file
kvsecret
We add the parameter in the parameters file of the template armtemplate.parameters.json with the following details

  • ID of the key vault ( resource ID of the KeyVault under the properties section )
  • name of the secret to extract (Mypassword)

kvsecretparamters.PNG
 
 

Summary:

Based on the above example , we achieved the following

  • Secure information ( usernames and passwords ) are not mentioned anywhere in the template or the parameter files
  • The secure values will be acquired when the template is running
  • The values are only accessible to those who have access to the Key vault.

 

Follow Us!

Kloud Solutions Blog - Follow Us!