Plugging the Gaps in Azure Policy – Part Two

Introduction

Welcome to the second and final part of my blogs on how to plug some gaps in Azure Policy. If you missed part one, this second part isn’t going to be a lot of use without the context from that, so maybe head on back and read part one before you continue.

In part one, I gave an overview of Azure Policy, a basic idea of how it works, what the gap in the product is in terms of resource evaluation, and a high-level view of how we plug that gap. In this second part, I’m going to show you how I built that idea out and provide you some scripts and a policy so you can spin up the same idea in your own Azure environments.

Just a quick note, that this is not a “next, next, finish” tutorial – if you do need something like that, there are plenty to be found online for each component I describe. My assumption is that you have a general familiarity with Azure as a whole, and the detail provided here should be enough to muddle your way through.

I’ll use the same image I used in part one to show you which bits we’re building, and where that bit fits in to the grand scheme of things.

We’re going to create a singular Azure Automation account, and we’re going to have two PowerShell scripts under it. One of those scripts will be triggered by a webhook, which will receive a POST from Event Grid, and the other will be fired by a good old-fashioned scheduler. I’m not going to include all the evaluations performed in my production version of this (hey, gotta hold back some IP right?) but I will include enough for you to build on for your own environment.

The Automation Account

When creating the automation account, you don’t need to put a lot of thought into it. By default when you create an automation account, it is going to automatically create as Azure Run As account on your behalf. If you’re doing this in your own lab, or an environment you have full control over, you’ll be able to do this step without issue , but typically in an Azure environment you may have access to build resources within a subscription, but perhaps not be able to create Azure AD objects – if that level of control applies to your environment, you will likely need to get someone to manually create an Azure AD Service Principal on your behalf. For this example, we’ll just let Azure Automation create the Run As account, which, by default, will have contributor access on the subscription you are creating the account under (which is plenty for what we are doing). You will also notice a “Classic” Run As account is also created – we’re not going to be using that, so you can scrap it. Good consultants like you will of course figure out the least permissions required for the production account and implement that accordingly rather than relying on these defaults.

The Event-Based Runbook

The Event-Based Runbook grabs parameters from POSTed JSON which we get from Event Hub. The JSON we get contains enough information about an individual resource which has been created or modified that we are able to perform an evaluation on that resource alone. In the next section, I will give you a sample of what that JSON looks like.

When we create this event-based Runbook, obviously we need somewhere to receive the POSTed JSON, so we need to create a Webhook. If you’ve never done this before, it’s a fairly straight forward exercise, but you need to be aware of the following things

  • When creating the Webhook, you are displayed the tokenized URL at the point of creation. Take note of it, you won’t be seeing it again and you’ll have to re-create the webhook if you didn’t save your notepad.
  • This URL is open out to the big bad internet. Although the damage you can cause in this instance is limited, you need to be aware that anyone with the right URL can hit that Webhook and start poking.
  • The security of the Webhook is contained solely in that tokenised URL (you can do some trickery around this, but it’s out of scope for this conversation) so in case the previous two points weren’t illustrative enough, the point is that you should be careful with Webhook security.

Below is the script we will use for the event-driven Runbook.

So, what are the interesting bits in there we need to know about? Well firstly, the webhook data. You can see we ingest the data initially into the $WebhookData variable, then store it in a more useful format in the $InputJSON variable, and then break it up into a bunch of other more useful variables $resourceUri, $status and $subject. The purpose in each of those variables is described below

 

VariablePurpose
$resourceUriThe resource URI of the resource we want to evaluate
$statusThe status of the Azure operation we received from Event Grid. If the operation failed to make a change for example, we don’t need to re-evaluate it.
$subjectThe subject contains the resource type, this helps us to narrow down the scope of our evaluation

 

As you can see, aside from dealing with inputs at the top, the script essentially has two parts to it: the tagging function, and the evaluation. As you can see from the evaluation (line 78-88) we scope down the input to make sure we only ever bother evaluating a resource if it’s one we care about. The evaluation itself, as you can see is really just saying “hey, does this resource have more than one NIC? If so, tag the resource using the tagging function. If it doesn’t? remove the tag using the tagging function”. Easy.

The Schedule-Based Runbook

The evaluations (and the function) we have in the schedule-based Runbook is essentially the same as what we have in the event-based one. Why do we even have the schedule-based Runbook then? Well, imagine for a second that Azure Automation has fallen over for a few minutes, or someone publishes dud code, or one of many other things happens which means the automation account is temporarily unavailable – this means the fleeting event which may occur one time only as a resource is being created is essentially lost to the ether, Having the schedule-based books means we can come back every 24 hours (or whatever your organisation decides) and pick up things which may have been missed.

The schedule-based runbook obviously does not have the ability to target individual resources, so instead it must perform an evaluation on all resources. The larger your Azure environment, the longer the processing time, and potentially the higher the cost. Be wary of this and make sensible decisions.

The schedule-based runbook PowerShell is pasted below.

Event Grid

Event Grid is the bit which is going to take logs from our Azure Subscription and allow us to POST it to our Azure Automation Webhook in order to perform our evaluation. Create your Event Grid Subscription with the “Event Grid Schema”, the “Subscription” topic type (using your target subscription) and listening only for “success” event types. The final field we care about on the Event Subscription create form, is for the Webhook – this is the one we created earlier in our Azure Automation Runbook, and now is the time to paste that value in.

Below is an example of the JSON we end up getting POSTed to our Webhook.

Azure Policy

And finally, we arrive at Azure Policy itself. So once again to remind you, all we are doing at this point is performing a compliance evaluation on a resource based solely on the tag applied to it, and accordingly, the policy itself is very simple. Because this is a policy based only on the tag, it means the only effect we can really use is “Audit” – we cannot deny creation of resources based on these evaluations.

The JSON for this policy is pasted below.

And that’s it, folks – I hope these last two blog posts have given you enough ideas or artifacts to start building out this idea in your own environments, or building out something much bigger and better using Azure Functions in place of our Azure Automation examples!

If you want to have a chat about how Azure Policy might be useful for your organisation, by all means, please do reach out, as a business we’ve done a bunch of this stuff now, and I’m sure we can help you to plug whatever gaps you might have.

 

Plugging the Gaps in Azure Policy – Part One

Introduction

Welcome to the first part of a two part blog on Azure Policy. Multi-part blogs are not my usual style, but the nature of blogging whilst also being a full time Consultant is that you slip some words in when you find time, and I was starting to feel if I wrote this in a single part, it would just never see the light of day. Part one of this blog deals with the high-level overview of what the problem is, and how we solved it at a high level, part two will include the icky sticky granular detail, including some scripts which you can shamelessly plagiarise.

Azure Policy is a feature complete solution which performs granular analysis on all your Azure resources, allowing your IT department to take swift and decisive action on resources which attempt to skirt infrastructure policies you define. Right, the sales guys now have their quotable line, let’s get stuck in to how you’re going to deliver on that.

Azure Policy Overview

First, a quick overview of what Azure Policy actually is. Azure Policy is a service which allows you to create rules (policies) which allow you to take an action on an attempt to create or modify an Azure resource. For example, I might have a policy which says “only allow VM SKU’s of Standard_D2s_v3” with the effect of denying the creation of said VM if it’s anything other than that SKU. Now, if a user attempts to create a VM other than the sizing I specify, they get denied – same story if they attempt to modify an existing VM to use that SKU. Deny is just one example of an “effect” we can take via Azure Policy, but we can also use Audit, Append, AuditIfNotExists, DeployIfNotExists, and Disabled.

Taking the actions described above obviously requires that you evaluate the resource to take the action. We do this using some chunks of JSON with fairly basic operators to determine what action we take. The properties you plug into a policy you create via Azure Policy, are not actually direct properties of the resource you are attempting to evaluate, rather we have “Aliases”, which map to those properties. So, for example, the alias for the image SKU we used as an example is “Microsoft.Compute/virtualMachines/imageSku”, which maps to the path “properties.storageProfile.imageReference.sku” on the actual resource. This leads me to….

The Gap

If your organisation has decided Azure Policy is the way forward (because of the snazzy dashboard you get for resource compliance, or because you’re going down the path of using baked in Azure stuff, or whatever), you’re going to find fairly quickly that there is currently not a one to one mapping between the aliases on offer, and the properties on a resource. Using a virtual machine as an example, we can use Azure Policy to take an effect on a resource depending on its SKU (lovely!) but up until very recently, we didn’t have the ability to say if you do spin up a VM with that SKU, that it should only ever have a single NIC attached. The existing officially supported path to getting such aliases added to Policy is via the Azure Policy GitHub (oh, by the way if you’re working with policy and not frequenting that GitHub, you’re doing it wrong). The example I used about the multiple NIC’s, you can see was a requested as an alias by my colleague Ken on October 22nd 2018, and marked as “deployed” into the product on February 14th 2019. Perhaps this is not bad for the turnaround from request to implementation into the product speaking in general terms, but not quick enough when you’re working on a project which relies on that alias for a delivery deadline which arrives months before February 14th 2019. A quick review of both the open and closed issues on the Azure Policy GitHub gives you a feel for the sporadic nature of issues being addressed, and in some cases due to complexity or security, the inability to address the issues at all. That’s OK, we can fix this.

Plugging the Gap

Something we can use across all Azure resources in Policy, are fields. One of the fields we can use, is the tag on a resource. So, what we can do here is report compliance status to the Azure Policy dashboard not based on the actual compliance status of the resource, but based on whether or not is has a certain tag applied to it – that is to say, a resource can be deemed compliant or non-compliant based on whether or not it has a tag of a certain value – then, we can use something out of band to evaluate the resources compliance and apply the compliance tag. Pretty cunning huh?

So I’m going to show you how we built out this idea for a customer. In this first part, you’re going to get the high-level view of how it hangs together, and in the second part I will share with you the actual scripts, policies, and other delicious little nuggets so you can build out a demo yourself should it be something you want to have a play with. Bear in mind the following things when using all this danger I am placing in your hands:

  • This was not built to scale, more as a POC, however;
    • This idea would be fine for handling a mid-sized Azure environment
  • This concept is now being built out using Azure Functions (as it should be)
  • Roll-your-own error handling and logging, the examples I will provide will contain none
  • Don’t rely on 100% event-based compliance evaluation (I’ll explain why in part 2)
  • I’m giving you just enough IP to be dangerous, be a good Consultant

Here’s a breakdown of how the solution hangs together. The example below will more or less translate to the future Functions based version, we’ll just scribble out a couple bits, add a couple bits in.

So, from the diagram above, here’s the high-level view what’s going on:

  1. Event Grid forwards events to a webhook hanging off a PowerShell Runbook.
  2. The PowerShell Runbook executes a script which evaluates the resource forwarded in the webhook data, and applies, removes, or modifies a tag accordingly. Separately, a similar PowerShell runbook fires on a schedule. The schedule-based script contains the same evaluations as the event-driven one, but rather than evaluate an individual resource, it will evaluate all of them.
  3. Azure Policy evaluates resources for compliance, and reports on it. In our case, compliance is simply the presence of a tag of a particular value.

Now, that might already be enough for many of you guys to build out something like this on your own, which is great! If you are that person, you’re probably going to come up with a bunch of extra bits I wouldn’t have thought about, because you’re working from a (more-or-less) blank idea. For others, you’re going to want some gnarly config and scripts so you can plug that stuff into your own environment, tweak it up, and customise it to fit your own lab – for you guys, see you soon for part two!

Kloud has been building out a bunch of stuff recently in Azure Policy, using both complex native policies, and ideas such as the one I’ve detailed here. If your organisation is looking at Azure Policy and you think it might be a good fit for your business, by all means, reach out for a chat. We love talking about this stuff.

Processing Azure Event Grid events across Azure subscriptions

Consider a scenario where you need to listen to Azure resource events happening in one Azure subscription from another Azure subscription. A use case for such a scenario can be when you are developing a solution where you listen to events happening in your customers’ Azure subscriptions, and then you need to handle those events from an Azure Function or Logic App running in your subscription.
A solution for such a scenario could be:
1. Create an Azure Function in your subscription that will handle Azure resource events received from Azure Event Grid.
2. Handle event validation in the above function, which is required to perform a handshake with Event Grid.
3. Create an Azure Event Grid subscription in the customers’ Azure subscriptions.
Before, I go into details let’s have a brief overview of Azure Event Grid.
Azure Event Grid is a routing service based on a publish/subscribe model, which is used for developing event-based applications. Event sources publish events, and event handlers can subscribe to these events via Event Grid subscriptions.

event-grids

Figure 1. Azure event grid publishers and handlers


Azure Event Grid subscriptions can be used to subscribe to system topics as well as custom topics. Various Azure services automatically send events to Event Grid. The system-level event sources that currently send events to Event Grid are Azure subscriptions, resource groups, Event Hubs, IOT Hubs, Azure Media Services, Service Bus, and blob storage
You can listen to these events by creating an event handler. Azure Event Grid supports several Azure Services and custom webhooks for event handlers. There are number of Azure services that can be used as event handlers, including Azure Functions, Logic Apps, Event Hubs, Azure Automation, Hybrid Connections, and storage queues.
In this post I’ll focus on using Azure Functions as an event handler to which an Event Grid subscription will send events to whenever an event occurs at the whole Azure subscription level. You can also create an Event Grid subscription at a resource group level to be notified only for the resources belonging to a particular resource group. The figure 1 posted above, shows various event sources that can publish events, and various supported event handlers. As per our solution Azure subscriptions and Azure Functions are marked.

Create an Azure Function in your subscription and handle the validation event from Event Grid

If our Event Grid subscription and function were in the same subscription, then we could have simply created an Event Grid-triggered Azure Function. Using that you can simply specify the Event Grid subscription details with this function specified as an endpoint in the Event Grid subscription. However, in our case this cannot be done as we need to have the Event Grid subscription in the customer subscription, and the Azure Function in our subscription. Therefore, we will simply create a HTTP-triggered function or a webhook function
Because we’re not selecting an Event Grid triggered function, we need us to do an extra validation step. At the time of creating a new Azure Event Grid subscription, Event Grid requires the endpoint to prove the ownership of the webhook, so that Event Grid can deliver the events to that endpoint. For built-in event handlers such as Logic Apps, Azure Automation, and Event Grid triggered functions, this process of validation is not necessary. However, in our scenario where we are using a HTTP-triggered function we need to handle the validation handshake
When an Event Grid subscription is created, it sends a subscription validation event in a POST request to the endpoint. All we need to do is to handle this event, read the request body, read the validationCode property in the data object in the request, and send it back in the response. Once Event Grid receives the same validation code back it knows that endpoint is validated, and it will start delivering events to our function. Following is an example of a POST request that Event Grid sends to the endpoint for validation.

Our function can check if the eventType is Microsoft.EventGrid.SubscriptionValidationEvent , which indicates it is meant for validation, and send back the value in data.validationCode. In all other scenarios, eventType will be based on the resource on which the event occurred, and the function can process those events accordingly. Also, the resource validation event contains a header aeg-event-type with value SubscriptionValidation. You should also validate this header.
Following is the sample code for a Node.js function to handle the validation event and send back the validation code and hence completing the validation handshake.

Processing Resource Events

To process the resource events, you can filter them on the resourceProvider or operationName properties. For example, the operationName property for a VM create event is set to Microsoft.Compute/virtualMachines/write. The event payload follows a fixed schema as described here. An event for a virtual machine creation looks like below:

Authentication

While creating the Event Grid subscription, detailed in next section, it should be created with the endpoint URL pointing to function URL including the function key.. Also, event validation done for the handshake acts as another means of authentication. To add an extra layer of authentication, you can generate your own access token, and append it to your function URL when specifying the endpoint for the Event Grid subscription. Your function can now also validate this access token before further processing.

Create an Azure Event Grid Subscription in customer’s subscription

A subscription owner/administrator should be able to run an Azure CLI or PowerShell command for creating the Event Grid subscription in customer subscription.
Important: This step must be done after the above step of creating the Azure Function is done. Otherwise, when you try to create an Event Grid subscription, and it raises the subscription validation event, Event Grid will not get a valid response back, and the creation of the Event Grid subscription will fail.
You can add filters to your Event Grid subscription to filter the events by subject. Currently, events can only be filtered with text comparison of the subject property value starting with or ending with some text. The subject filter doesn’t support a wildcard or regex search.

Azure CLI or PowerShell

An example Azure CLI command to create an Event Grid Subscription, which receives all the events occurring at subscription level is as below:

Here https://myhttptriggerfunction.azurewebsites.net/api/f1?code= is the URL of the function app.

Azure REST API

Instead of asking customer to run a CLI or PowerShell script to create the Event Grid subscription, you can automate this process by writing another Azure Function that calls Azure REST API. The API call can be invoked using service principal with rights on the customer’s subscription.
To create an Event Grid subscription for the customer’s Azure Subscription, you submit the following PUT request:
PUT https://management.azure.com/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /providers/Microsoft.EventGrid/eventSubscriptions/ eg-subscription-test?api-version=2018-01-01
Request Body:
{
"properties": {
"destination": {
"endpointType": "WebHook",
"properties": {
"endpointUrl": " https://myhttptriggerfunction.azurewebsites.net/api/f1?code="
}
},
"filter": {
"isSubjectCaseSensitive": false
}
}
}

 

Follow Us!

Kloud Solutions Blog - Follow Us!