Automatic Key Rotation for Azure Services

Securely managing keys for services that we use is an important, and sometimes difficult, part of building and running a cloud-based application. In general I prefer not to handle keys at all, and instead rely on approaches like managed service identities with role-based access control, which allow for applications to authenticate and authorise themselves without any keys being explicitly exchanged. However, there are a number of situations where do we need to use and manage keys, such as when we use services that don’t support role-based access control. One best practice that we should adopt when handling keys is to rotate (change) them regularly.

Key rotation is important to cover situations where your keys may have compromised. Common attack vectors include keys having been committed to a public GitHub repository, a log file having a key accidentally written to it, or a disgruntled ex-employee retaining a key that had previously been issued. Changing the keys means that the scope of the damage is limited, and if keys aren’t changed regularly then these types of vulnerability can be severe.

In many applications, keys are used in complex ways and require manual intervention to rotate. But in other applications, it’s possible to completely automate the rotation of keys. In this post I’ll explain one such approach, which rotates keys every time the application and its infrastructure components are redeployed. Assuming the application is deployed regularly, for example using a continuous deployment process, we will end up rotating keys very frequently.

Approach

The key rotation process I describe here relies on the fact that the services we’ll be dealing with – Azure Storage, Cosmos DB, and Service Bus – have both a primary and a secondary key. Both keys are valid for any requests, and they can be changed independently of each other. During each release we will pick one of these keys to use, and we’ll make sure that we only use that one. We’ll deploy our application components, which will include referencing that key and making sure our application uses it. Then we’ll rotate the other key.

The flow of the script is as follows:

  1. Decide whether to use the primary key or the secondary key for this deployment. There are several approaches to do this, which I describe below.
  2. Deploy the ARM template. In our example, the ARM template is the main thing that reads the keys. The template copies the keys into an Azure Function application’s configuration settings, as well as into a Key Vault. You could, of course, output the keys and have your deployment script put them elsewhere if you want to.
  3. Run the other deployment logic. For our simple application we don’t need to do anything more than run the ARM template deployment, but for many deployments  you might copy your application files to a server, swap the deployment slots, or perform a variety of other actions that you need to run as part of your release.
  4. Test the application is working. The Azure Function in our example will perform some checks to ensure the keys are working correctly. You might also run other ‘smoke tests’ after completing your deployment logic.
  5. Record the key we used. We need to keep track of the keys we’ve used in this deployment so that the next deployment can use the other one.
  6. Rotate the other key. Now we can rotate the key that we are not using. The way that we rotate keys is a little different for each service.
  7. Test the application again. Finally, we run one more check to ensure that our application works. This is mostly a last check to ensure that we haven’t accidentally referenced any other keys, which would break our application now that they’ve been rotated.

We don’t rotate any keys until after we’ve already switched the application to using the other set of keys, so we should never end up in a situation where we’ve referenced the wrong keys from the Azure Functions application. However, if we wanted to have a true zero-downtime deployment then we could use something like deployment slots to allow for warming up our application before we switch it into production.

A Word of Warning

If you’re going to apply this principle in this post or the code below to your own applications, it’s important to be aware of an important limitation. The particular approach described here only works if your deployments are completely self-contained, with the keys only used inside the deployment process itself. If you provide keys for your components to any other systems or third parties, rotating keys in this manner will likely cause their systems to break.

Importantly, any shared access signatures and tokens you issue will likely be broken by this process too. For example, if you provide third parties with a SAS token to access a storage account or blob, then rotating the account keys will cause the SAS token to be invalidated. There are some ways to avoid this, including generating SAS tokens from your deployment process and sending them out from there, or by using stored access policies; these approaches are beyond the scope of this post.

The next sections provide some detail on the important steps in the list above.

Step 1: Choosing a Key

The first step we need to perform is to decide whether we should use the primary or secondary keys for this deployment. Ideally each deployment would switch between them – so deployment 1 would use the primary keys, deployment 2 the secondary, deployment 3 the primary, deployment 4 the secondary, etc. This requires that we store some state about the deployments somewhere. Don’t forget, though, that the very first time we deploy the application we won’t have this state set. We need to allow for this scenario too.

The option that I’ve chosen to use in the sample is to use a resource group tag. Azure lets us use tags to attach custom metadata to most resource types, as well as to resource groups. I’ve used a custom tag named CurrentKeys to indicate whether the resources in that group currently use the primary or secondary keys.

There are other places you could store this state too – some sort of external configuration system, or within your release management tool. You could even have your deployment scripts look at the keys currently used by the application code, compare them to the keys on the actual target resources, and then infer which key set is being used that way.

A simpler alternative to maintaining state is to randomly choose to use the primary or secondary keys on every deployment. This may sometimes mean that you end up reusing the same keys repeatedly for several deployments in a row, but in many cases this might not be a problem, and may be worth the simplicity of not maintaining state.

Step 2: Deploy the ARM Template

Our ARM template includes the resource definitions for all of the components we want to create – a storage account, a Cosmos DB account, a Service Bus namespace, and an Azure Function app to use for testing. You can see the full ARM template here.

Note that we are deploying the Azure Function application code using the ARM template deployment method.

Additionally, we copy the keys for our services into the Azure Function app’s settings, and into a Key Vault, so that we can access them from our application.

Step 4: Testing the Keys

Once we’ve finished deploying the ARM template and completing any other deployment steps, we should test to make sure that the keys we’re trying to use are valid. Many deployments include some sort of smoke test – a quick test of core functionality of the application. In this case, I wrote an Azure Function that will check that it can connect to the Azure resources in question.

Testing Azure Storage Keys

To test connectivity to Azure Storage, we run a query against the storage API to check if a blob container exists. We don’t actually care if the container exists or not; we just check to see if we can successfully make the request:

Testing Cosmos DB Keys

To test connectivity to Cosmos DB, we use the Cosmos DB SDK to try to retrieve some metadata about the database account. Once again we’re not interested in the results, just in the success of the API call:

Testing Service Bus Keys

And finally, to test connectivity to Service Bus, we try to get a list of queues within the Service Bus namespace. As long as we get something back, we consider the test to have passed:

You can view the full Azure Function here.

Step 6: Rotating the Keys

One of the last steps we perform is to actually rotate the keys for the services. The way in which we request key rotations is different depending on the services we’re talking to.

Rotating Azure Storage Keys

Azure Storage provides an API that can be used to regenerate an account key. From PowerShell we can use the New-AzureRmStorageAccountKey cmdlet to access this API:

Rotating Cosmos DB Keys

For Cosmos DB, there is a similar API to regenerate an account key. There are no first-party PowerShell cmdlets for Cosmos DB, so we can instead a generic Azure Resource Manager cmdlet to invoke the API:

Rotating Service Bus Keys

Service Bus provides an API to regenerate the keys for a specified authorization rule. For this example we’re using the default RootManageSharedAccessKey authorization rule, which is created automatically when the Service Bus namespace is provisioned. The PowerShell cmdlet New-AzureRmServiceBusKey can be used to access this API:

You can see the full script here.

Conclusion

Key management and rotation is often a painful process, but if your application deployments are completely self-contained then the process described here is one way to ensure that you continuously keep your keys changing and up-to-date.

You can download the full set of scripts and code for this example from GitHub.

Azure Application Gateway WAF tuning

The Azure Application Gateway has a Web Application Firewall (WAF) capability that can be enabled on the gateway. The WAF will use the OWASP ModSecurity Core Rule Set 3.0 by default and there is an option to use CRS 2.2.9.

CRS 3.0 offers reduced occurrences of false positives over 2.2.9 by default. However, there may still be times when you need to tune your WAF rule sets to avoid false positives in your site.

Blocked access to the site

The Azure WAF filters all incoming requests to the servers in the backend of the Application Gateway. It uses the ModSecurity Core Rule Sets described above to protect your sites against various items such as code injections, hack attempts, web attacks, bots and mis-configurations.

When the threshold of rules are triggered on the WAF, access is denied to the page and a 403 error is returned. In the below screenshot, we can see that the WAF has blocked access to the site, and when viewing the page in Chrome tools under Network -> Headers we can see that the Status Code is 403 ModSecurity Action

403

Enable WAF Diagnostics

To be able to view more information on the rules that are being triggered on the WAF you will need to turn on Diagnostic Logs, you do this by adding a diagnostic setting. There are different options for configuring the diagnostic settings but in this example we will direct them to an Azure Storage Account.

diagnosticsettings

Viewing WAF Diagnostic Logs

Now that diagnostic logging is enabled for the WAF to direct to a storage account we can browse to the storage account and view the log files. An easy way to do this is to download the Azure Storage Explorer. You can then use it to browse the storage account and you will see 3 containers that are used for the Application Gateway logging.

  • insights-logs-applicationgatewayaccesslog
  • insights-logs-applicationgatewayfirewalllog
  • insights-logs-applicationgatewayperformancelog

The container that we are interested in for the WAF logs is the insights-logs-applicationgatewayfirewalllog container.

Navigate through the container until you find the PT1H.json file. This is the hourly log of firewall actions on the WAF. Double click on the file and it will open in the application set to view json files.

storageexplorer

Each entry in the WAF will include a information about the request and why it was triggered such as the ruleID, Message details. In the below sample log there are 2 highlighted entries.

The message details for the first highlighted log indicate the following “Access denied with code 403 (phase 2). Operator GE matched 5 at TX:anomaly_score.“.

So we can see that when the anomaly threshold of 5 was reached the WAF triggered the 403 ModSecurity action that we initially saw from the browser when trying to access the site. It is also important to notice that this particular rule cannot be disabled, and it indicates that it is an accumulation of rules being triggered.

The second rule indicates that a file with extension .axd is being blocked by a policy.

waflog

Tuning WAF policy rules

Each of the WAF log entries that are captured should be carefully reviewed to determine if they are valid threats. If after reviewing the logs you are able to determine that the entry is a false positive or the log captures something that is not considered a risk you have the option to tune the rules that will be enforced.

From the Web Application Firewall section within the Application Gateway you have the following options:

  • Enable or Disable the WAF
  • Configure Detection or Prevention modes for the WAF
  • Select rule set to use
  • Customize rule configuration

In the example above, if we were to decide that the .axd file extension is valid and allowed for the site we could search for the ruleID 9420440 and un-select it.

Once the number of rules being triggered reduces below the inbound threshold amount the 403 ModSecurity Action will no longer prevent access to the site.

For new implementations or during testing you could apply the Detection mode only and view and fine tune the WAF prior to enabling for production use.

waftuning

Deploying Azure Functions with ARM Templates

There are many different ways in which an Azure Function can be deployed. In a future blog post I plan to go through the whole list. There is one deployment method that isn’t commonly known though, and it’s of particular interest to those of us who use ARM templates to deploy our Azure infrastructure. Before I describe it, I’ll quickly recap ARM templates.

ARM Templates

Azure Resource Manager (ARM) templates are JSON files that describe the state of a resource group. They typically declare the full set of resources that need to be provisioned or updated. ARM templates are idempotent, so a common pattern is to run the template deployment regularly—often as part of a continuous deployment process—which will ensure that the resource group stays in sync with the description within the template.

In general, the role of ARM templates is typically to deploy the infrastructure required for an application, while the deployment of the actual application logic happens separately. However, Azure Functions’ ARM integration has a feature whereby an ARM template can be used to deploy the files required to make the function run.

How to Deploy Functions in an ARM Template

In order to deploy a function through an ARM template, we need to declare a resource of type Microsoft.Web/sites/functions, like this:

There are two important parts to this.

First, the config property is essentially the contents of the function.json file. It includes the list of bindings for the function, and in the example above it also includes the disabled property.

Second, the files property is an object that contains key-value pairs representing each file to deploy. The key represents the filename, and the value represents the full contents of the file. This only really works for text files, so this deployment method is probably not the right choice for precompiled functions and other binary files. Also, the file needs to be inlined within the template, which may quickly get unwieldy for larger function files—and even for smaller files, the file needs to be escaped as a JSON string. This can be done using an online tool like this, or you could use a script to do the escaping and pass the file contents as a parameter into the template deployment.

Importantly, in my testing I found that using this method to deploy over an existing function will remove any files that are not declared in the files list, so be careful when testing this approach if you’ve modified the function or added any files through the portal or elsewhere.

Examples

There are many different ways you can insert your function file into the template, but one of the ways I tend to use is a PowerShell script. Inside the script, we can read the contents of the file into a string, and create a HashTable for the ARM template deployment parameters:

Then we can use the New-AzureRmResourceGroupDeployment cmdlet to execute the deployment, passing in $templateParameters to the -TemplateParameterObject argument.

You can see the full example here.

Of course, if you have a function that doesn’t change often then you could instead manually convert the file into a JSON-encoded string using a tool like this one, and paste the function right into the ARM template. To see a full example of how this can be used, check out this example ARM template from a previous blog article I wrote.

When to Use It

Deploying a function through an ARM template can make sense when you have a very simple function that is comprised of one, or just a few, files to be deployed. In particular, if you already deploy the function app itself through the ARM template then this might be a natural extension of what you’re doing.

This type of deployment can also make sense if you’re wanting to quickly deploy and test a function and don’t need some of the more complex deployment-related features like control over handling locked files. It’s also a useful technique to have available for situations where a full deployment script might be too heavyweight.

However, for precompiled functions, functions that have binary files, and for complex deployments, it’s probably better to use another deployment mechanism. Nevertheless, I think it’s useful to know that this is a tool in your Azure Functions toolbox.

Provisioning complex Modern Sites with Azure Functions and Flow – Part 2 – Create and Apply Template

In the previous blog here, we got an overview of the high level Architecture of a Complex Modern team site provisioning process. In this blog, we will look at the step 1 of the process – Create and Apply template process, in detail.

Before that, below are few links to earlier blogs, as a refresher, to prerequisties for the blog.

  1. Set up a Graph App to call Graph Service using App ID and Secret – link
  2. Sequencing HTTP Trigger Azure Functions for simultaneous calls – link
  3. Adding and Updating owners using Microsoft Graph Async calls – link

Overview

The Create and Apply Template process aims at the following

1. Create a blank modern team site using Groups Template (Group#0 Site template)

2. Apply the provisioning template on the created site.

Step 1 : Create a blank Modern team site

For creating a modern team site using CSOM we will use the TeamSiteCollectionCreationInformation class of OfficeDevPnP.  Before we create the site, we will make sure the site doesn’t already exist.

Note: There is an issue with the Site Assets library not getting intialized 
when the site is created using the below code. 
Hence, calling the EnsureSiteAssets library is necessary.

Step 2:  Apply the Provisioning Template

Note: The Apply template process is a long running process and takes from 60-90 min to complete 
for a complex provisioning template with many site columns, content types and libraries. 
In order to prevent the Azure function from timing out, it is required to host the Azure Function 
using a App Service Plan instead of a Consumption plan so the Azure function 
is not affected by the 10 min time out. 

For the Apply Provisioning Template process, use the below steps.

1. Reading the Template

It is important to note that the XMLPnPSchemaFormatter version (in the code below) must match the PnP version used to generate the PnP template. If the version is older, then set the XMLPnPSchemaFormatter to read from the older version. In order to find the version of the PnP Template, open the xml and look at the start of the file

PnPTemplateVersion

2. Apply the Template

For applying the template, we will use the ProvisioningTemplateApplyingInformation class of the OfficeDevPnP module. The ProvisioningTemplateApplyingInformation also has a property called HandlerToProcess which could be used the invoke the particular handler in the provisioning template process. Below is the code for the same.

After the apply template process is complete, since the flow will have timed out, we will invoke another flow to do the post process by updating a list item in the SharePoint list.

Conclusion

In this blog, we saw how we could create a modern team site and apply the template on it. The next blog we will finalize the process by doing site specfic changes after applying the template.

Processing Azure Event Grid events across Azure subscriptions

Consider a scenario where you need to listen to Azure resource events happening in one Azure subscription from another Azure subscription. A use case for such a scenario can be when you are developing a solution where you listen to events happening in your customers’ Azure subscriptions, and then you need to handle those events from an Azure Function or Logic App running in your subscription.

A solution for such a scenario could be:
1. Create an Azure Function in your subscription that will handle Azure resource events received from Azure Event Grid.
2. Handle event validation in the above function, which is required to perform a handshake with Event Grid.
3. Create an Azure Event Grid subscription in the customers’ Azure subscriptions.

Before, I go into details let’s have a brief overview of Azure Event Grid.

Azure Event Grid is a routing service based on a publish/subscribe model, which is used for developing event-based applications. Event sources publish events, and event handlers can subscribe to these events via Event Grid subscriptions.

event-grids

Figure 1. Azure event grid publishers and handlers

Azure Event Grid subscriptions can be used to subscribe to system topics as well as custom topics. Various Azure services automatically send events to Event Grid. The system-level event sources that currently send events to Event Grid are Azure subscriptions, resource groups, Event Hubs, IOT Hubs, Azure Media Services, Service Bus, and blob storage

You can listen to these events by creating an event handler. Azure Event Grid supports several Azure Services and custom webhooks for event handlers. There are number of Azure services that can be used as event handlers, including Azure Functions, Logic Apps, Event Hubs, Azure Automation, Hybrid Connections, and storage queues.

In this post I’ll focus on using Azure Functions as an event handler to which an Event Grid subscription will send events to whenever an event occurs at the whole Azure subscription level. You can also create an Event Grid subscription at a resource group level to be notified only for the resources belonging to a particular resource group. The figure 1 posted above, shows various event sources that can publish events, and various supported event handlers. As per our solution Azure subscriptions and Azure Functions are marked.

Create an Azure Function in your subscription and handle the validation event from Event Grid

If our Event Grid subscription and function were in the same subscription, then we could have simply created an Event Grid-triggered Azure Function. Using that you can simply specify the Event Grid subscription details with this function specified as an endpoint in the Event Grid subscription. However, in our case this cannot be done as we need to have the Event Grid subscription in the customer subscription, and the Azure Function in our subscription. Therefore, we will simply create a HTTP-triggered function or a webhook function

Because we’re not selecting an Event Grid triggered function, we need us to do an extra validation step. At the time of creating a new Azure Event Grid subscription, Event Grid requires the endpoint to prove the ownership of the webhook, so that Event Grid can deliver the events to that endpoint. For built-in event handlers such as Logic Apps, Azure Automation, and Event Grid triggered functions, this process of validation is not necessary. However, in our scenario where we are using a HTTP-triggered function we need to handle the validation handshake

When an Event Grid subscription is created, it sends a subscription validation event in a POST request to the endpoint. All we need to do is to handle this event, read the request body, read the validationCode property in the data object in the request, and send it back in the response. Once Event Grid receives the same validation code back it knows that endpoint is validated, and it will start delivering events to our function. Following is an example of a POST request that Event Grid sends to the endpoint for validation.

Our function can check if the eventType is Microsoft.EventGrid.SubscriptionValidationEvent , which indicates it is meant for validation, and send back the value in data.validationCode. In all other scenarios, eventType will be based on the resource on which the event occurred, and the function can process those events accordingly. Also, the resource validation event contains a header aeg-event-type with value SubscriptionValidation. You should also validate this header.

Following is the sample code for a Node.js function to handle the validation event and send back the validation code and hence completing the validation handshake.

Processing Resource Events

To process the resource events, you can filter them on the resourceProvider or operationName properties. For example, the operationName property for a VM create event is set to Microsoft.Compute/virtualMachines/write. The event payload follows a fixed schema as described here. An event for a virtual machine creation looks like below:

Authentication

While creating the Event Grid subscription, detailed in next section, it should be created with the endpoint URL pointing to function URL including the function key.. Also, event validation done for the handshake acts as another means of authentication. To add an extra layer of authentication, you can generate your own access token, and append it to your function URL when specifying the endpoint for the Event Grid subscription. Your function can now also validate this access token before further processing.

Create an Azure Event Grid Subscription in customer’s subscription

A subscription owner/administrator should be able to run an Azure CLI or PowerShell command for creating the Event Grid subscription in customer subscription.

Important: This step must be done after the above step of creating the Azure Function is done. Otherwise, when you try to create an Event Grid subscription, and it raises the subscription validation event, Event Grid will not get a valid response back, and the creation of the Event Grid subscription will fail.

You can add filters to your Event Grid subscription to filter the events by subject. Currently, events can only be filtered with text comparison of the subject property value starting with or ending with some text. The subject filter doesn’t support a wildcard or regex search.

Azure CLI or PowerShell

An example Azure CLI command to create an Event Grid Subscription, which receives all the events occurring at subscription level is as below:

Here https://myhttptriggerfunction.azurewebsites.net/api/f1?code= is the URL of the function app.

Azure REST API

Instead of asking customer to run a CLI or PowerShell script to create the Event Grid subscription, you can automate this process by writing another Azure Function that calls Azure REST API. The API call can be invoked using service principal with rights on the customer’s subscription.

To create an Event Grid subscription for the customer’s Azure Subscription, you submit the following PUT request:

PUT https://management.azure.com/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /providers/Microsoft.EventGrid/eventSubscriptions/ eg-subscription-test?api-version=2018-01-01

Request Body:

{
"properties": {
"destination": {
"endpointType": "WebHook",
"properties": {
"endpointUrl": " https://myhttptriggerfunction.azurewebsites.net/api/f1?code="
}
},
"filter": {
"isSubjectCaseSensitive": false
}
}
}

 

Hub-Spoke communication using vNet Peering and User Defined Routes

Introduction

Recently, I was working on a solution for a customer where they wanted to implement a Hub-Spoke virtual network topology that enabled the HUB to communicate with its Spoke networks via vNet Peering. They also required the SPOKE networks to be able to communicate with each other but peering between them was NOT allowed.

Drawing1

As we know, vNet peering is Non-Transitive – which means, even though SPOKE 1 is peered with the HUB network and the HUB is peered with SPOKE 2, this does not enable automatic communication between SPOKE 1 and SPOKE 2 unless they are exclusively peered which in our requirement we were not allowed to do.

So, let’s explore a couple of options on how we can enable communication between the Spoke networks without peering.

Solutions

There are several ways to implement Spoke to Spoke communication, but in this blog I’d like to provide details of the 2 feasible options that worked for us.

Option 1– is to place a Network Virtual Appliance (NVA) basically a Virtual Machine with a configured firewall/router within the HUB and configure it to forward traffic to and from the SPOKE networks.

If you search the Azure Market Place with the keywords “Network Virtual Appliance“, you will be presented with several licensed products that you could install and configure in the HUB network to establish this communication. Configuration of these virtual appliances varies and installation instructions can easily be found on their product websites.

Option 2- is to have a Virtual Network Gateway attached to the HUB network and make use of User Defined Routes, to enable communication between the SPOKES.

The above information was sourced from this very helpful blog post.

The rest of this blog is a detailed step by step guide and the testing performed for implementing the approach mentioned in Option 2.

Implementation

1.) Create 3 Virtual Networks with non-overlapping IP addresses

  • Log on to the Azure Portal and create the Hub Virtual Network as follows

1

  • Create the 2 additional virtual networks as the SPOKES with the following settings:

2

3

2.) Now that we have the 3 Virtual Networks provisioned, let’s start Peering them as follows:

a.) HubNetwork <> Spoke1Network

b.) HubNetwork <> Spoke2Network

  • Navigate to the Hub Virtual Network and create a new peering with the following settings:

4

Select the “Allow gateway transit” option.

  • Repeat the above step to create a peering with Spoke2Network as well.

3.) To establish a successful connection, we will have to create a peering to the HUB Virtual Network from each of the SPOKE Networks too

  • Navigate to Spoke1Network and create a new Peering

6

Notice, that when we select the “Use remote gateways” option, we get an error as we haven’t yet attached a Virtual Network Gateway to the HUB network. Once a Gateway has been attached, we will come back to re-configure this.

For now, Do Not select this option and click Create.

  • Repeat the above step for Spoke2 Virtual Network

4.) Let’s now provision a Virtual Network Gateway

  • Before provisioning a gateway, a Gateway Subnet is required within the Hub Virtual Network. To create this, click on the “Subnets” option in the blade of the Hub Virtual Network and then Click on “Gateway subnet

7

For the purpose of this demo, we will create a Gateway Subnet with the smallest possible network address space with CIDR /29 which provides us with 8 addresses of which the first and last IP are reserved for protocol conformance and x.x.x.1 – x.x.x.3 for azure services. For production environments, a Gateway Subnet with at least /27 address space is advised.

Let’s assume for now that when we provision the Virtual Network Gateway, the internal IP address it gets assigned to will be from the 4th address on wards which in our case would be 10.4.1.4

  • Provision the Virtual Network Gateway

Create a new Virtual Network Gateway with the following settings:

8

Ensure that you select the Hub Virtual Network in the Virtual network field which is where we want the Gateway to be attached. Click Create.

  • The Gateway provisioning process may take a while to complete and you will need to wait for the Updating status to disappear. It can take anywhere between 30-45 mins.

9

5.) Once the Gateway has been provisioned, lets now go back to the Peering section of each of the SPOKE Networks and configure “Use Remote gateways” option

10

  • Repeat the above step for Spoke2ToHub peering

6.) We will now create the Route Tables and define user routes needed for the SPOKE to SPOKE communication

  • Create 2 new Route tables in the portal with the following settings:

11

12

  • Define the User Routes as follows:

13

In the Address Prefix field, insert the CIDR Subnet address of the Spoke2 Virtual Network which in our case is 10.6.0.0/16

Select Next hop type as Virtual appliance and the Next hop address as the internal address of the Virtual Network Gateway. In our case, we are going to have this set as 10.4.1.4 as mentioned earlier.

  • Repeat this step to create a new Route in the Spoke2RouteTable as well by inserting the Subnet CIDR address of Spoke1 Virtual Network

7.) Let’s now associate these Route tables with our Virtual Networks

  • Navigate to the Spoke1Network and in the “Subnets” section of the blade, select the default subnet

14

In the Route table field select, Spoke1RouteTable and click Save

15

  • Repeat the above step to associate Spoke2RouteTable with the Spoke2 Virtual Network

We have now completed the required steps to ensure that both SPOKE Virtual Networks are able to communicate with each other via the HUB

Testing

  • In order to test our configurations, let’s provision a virtual machine in each of the Spoke networks and conduct a simple ping test

1.) Provision a basic Virtual Machine in each of the Spoke networks

2.) Run the following Powershell command in each VM to allow ICMP ping in the windows firewall as this port is blocked by default:

New-NetFirewallRule –DisplayName "Allow ICMPv4-In" –Protocol ICMPv4

3.) In my testing the VM’s had the following internal IP

The VM running in Spoke 1 network: 10.5.0.4

The VM running in Spoke 2 network: 10.6.0.4

16

Pinging 10.6.0.4 from 10.5.0.4 returns a successful response!

Provisioning complex Modern Sites with Azure Functions and Microsoft Flow – Part 1 – Architecture

In one of my previous blog here,  I have discussed about creating Office 365 groups using Azure Function and Flow. The same process could be used also to provision Modern Team sites in SharePoint Online because Modern Team Sites are Office 365 groups too. However, if you are creating a Complex Modern Team Site with lots of Libraries, Content types, Termstore associated columns etc. it will challenging to do it with a single Azure Function.

Thus, in this blog (part 1), we will look at the Architecture of a Solution to provision a complex Modern Team Site using multiple Azure Function and Flows. This is an approach that went through four months of validation and testing. There might be other options but this one worked for the complex team site which takes around 45-90 mins to provision.

Solution Design

To start with lets’ look at the solution design. The solution consists of two major components

1. Template Creation – Create a SharePoint Modern Team site to be used as a template and generate a Provisioning template from it

2. Provisioning Process – Create a SharePoint Inventory List to run the Flow and Azure Function. There will be three Azure Functions that will run three separate parts of the provisioning lifecycle. More details about the Azure Functions will in upcoming blog.

Get the Provisioning Template

The first step in the process is to  create a clean site that will be used as a reference template site for the Provisioning template. In this site, create all the lists, libraries, site columns, content type and set other necessary site settings.

In order to make sure that the generated template doesn’t have any elements which are not needed for provisioning, use the following PnP PowerShell cmdlet. The below cmdlet removes any content type hub association, ALM api handles and site security for provisioning requirements.

Get-PnPProvisioningTemplate -Out "" -ExcludeHandlers ApplicationLifecycleManagement, SiteSecurity -ExcludeContentTypesFromSyndication

The output of the above cmdlet is ProvisioningTemplate.xml file which could be applied to new sites for setting up the same SharePoint elements. To know more about the provisioning template file, schema and allowed tags, check the link here.

ModernSitesProvisioningFlow_GetTemplate

Team Site Provsioning Process

The second step in the process would be to create and apply the template to a Modern SharePoint Team site using Flow and Azure Function. The detail steps would be as follows:

1. Create an Inventory list to capture all the requirements for Site Creation

2. Create two flows

a) Create and Apply Template flow, and

b) Post Provisioning Flow

3. Create three Azure Functions –

a) Create a blank Modern Team Site

b) Apply Provisioning Template on the above site. This is a long running process and can take about 45-90 min for applying a complex template with about 20 libraries, 20-30 site columns and 10-15 content types

Note: Azure Functions on Consumption plan have a timeout of 10 min. Host the Azure function on an App Service Plan for the above to work without issues

c) Post Provisioning to apply changes that are not supported by Provisioning Template such as Creating default folders etc.

Below is the process flow for the provisioning process. It has steps from 1 – 11 which goes from creating the site to applying it. The brief list of the steps are as follows

  1. Call the Create Site flow to start the Provisioning Process
  2. Call the Create Site Azure Function
  3. Create the Modern Team Site in Azure Function and set any dependencies required for the Apply template such as Navigation items, pages etc, and then return to flow
  4. Call the Apply Template Azure Function.
  5. Get the previously generated ProvisioningTemplate.xml file from a shared location
  6. Apply the Template onto the newly created Modern site. Note: The flow call times out because it cannot wait for such a long running process
  7. Update the status column in the Site Directory for the post provisioning flow to start
  8. Call the Post provisioning flow to run the Post provisioning azure function
  9. The Post provisioning azure function will complete the remaining SharePoint changes which were not completed by the apply template such as, set field default values, create folders in any libraries, associate default values to taxonomy fields etc.

ModernSitesProvisioningFlow_ProvisioningProcess

Conclusion:

Hence in the above blog, we saw how to create a provisioning process to handle complex modern team site creation at a high architectural level. Next, we will deep dive into the Azure functions to create, apply template and post process in the next upcoming blogs.

Happy Coding!!!

Deploying Blob Containers with ARM Templates

ARM templates are a great way to programmatically deploy your Azure resources. They act as declarative descriptions of the desired state of an Azure resource group, and while they can be frustrating to work with, overall the ability to use templates to deploy your Azure resources provides a lot of value.

One common frustration with ARM templates is that certain resource types simply can’t be deployed with them. Until recently, one such resource type was a blob container. ARM templates could deploy Azure Storage accounts, but not blob containers, queues, or tables within them.

That has now changed, and it’s possible to deploy a blob container through an ARM template. Here’s an example template that deploys a container called logs within a storage account:

Queues and tables still can’t be deployed this way, though – hopefully that’s coming soon.

Azure ExpressRoute Public and Microsoft peering changes, notes from the field

I’ve been trying to piece all this together and get a single, concise blog post that covers all bases around the changes that have happened and are going to be happening for Microsoft ExpressRoute peering. That’s been a bit of a challenge because, I hope I don’t harp on this too much, but, communication could be a bit better from the product group team. With that said, though, it’s no secret for those that use ExpressRoute, Microsoft is looking to simply it’s configuration. Good news I guess?

The main change that I’m going to delve into here comes by way of merging Microsoft Peering and Public peering into a single Microsoft Peer. Microsoft announced this at the Ignite 2017 conference:

“To simplify ExpressRoute management and configuration we merged public and Microsoft peering”.

Fast forward from September 2017, there’s not been much communication around this shift in ExpressRoute config. I’ve been scouring the interwebs for publicly available confirmation; and all I could find is a blog post that highlighted that:

“As of April 1, 2018, you cannot configure Public peering on new ExpressRoute circuits.”

Searching the Twitterverse for the hashtag #PublicPeering, we get the following confirmation only a few days later on April 5th:

So, we have confirmation that this change in ExpressRoute Public peering is happening; followed by a confirmation that as of April 1st, 2018 (no this wasn’t a joke), any new ExpressRoute circuits provisioned on or after that April fools date cannot have Public Peering. Well, given the breadth of Microsoft, communication is in a grey area. Apart from that Japanese TechNet blog post, there’s really only suggestions and recommendations budging customers to Microsoft peering. Here’s two examples:

  1. Microsoft peering is the preferred way to access all services hosted on Azure. (Source)
  2. All Azure PaaS services are also accessible through Microsoft peering. We recommend you to create Microsoft peering and connect to Azure PaaS services over Microsoft peering. (Source)

I know I’m banging on about this for too long, but, for me this is a grey area and better communication is required!

 

Migration

If you’re currently using Public peering and need to move to Microsoft peering, there’s some pretty good guidance from Microsoft on how to Migrate – available here.

NOTE: Microsoft peering of ExpressRoute circuits that were configured prior to August 1, 2017 will have all service prefixes advertised through Microsoft peering, even if route filters are not defined. Microsoft peering of ExpressRoute circuits that are configured on or after August 1, 2017 will not have any prefixes advertised until a route filter is attached to the circuit. (Source)

For many customers, and recently a customer I’ve been working with, they’ve had ExpressRoute for several years now. This change has culminated in some interesting circumstances. For this customers migration process, they were actually after upgrading to a faster network carriage and faster ExpressRoute circuit. This meant we could line up the new environment in parallel to the legacy and in configuring peering on the new service, we just configured it as Microsoft Peering only, no more Public peering.

This is all well and good, but, using a legacy ExpressRoute circuit that was configured in ASM/Classic, there’s now also the consideration of Route Filters. In the legacy or Classic ExpressRoute deployment, BGP Communities were not used. Routes were advertised as soon as the peer came on line and, ARP done and eBGP session established between Azure and the customer.

In the ARM ExpressRoute deployment model, Azure Route Filters are a requirement for Microsoft peering (only). Note that this is an Azure side config, not a customer side which can confuse people when talking BGP Route Filters. Similar concept, similar name, much confuse.

ExpressRoute Microsoft peering, out of the box, no routes are advertised from Azure to the customer until such time that a Route Filter is associated with the peer. Inside of that Route Filter, BGP Community tags for the relevant services also need to be defined.

Again, just need to highlight that Route Filters are only required for Microsoft Peering, not for Private peering.

Here’s a few more relevant references to ExpressRoute, Route Filters and BGP Communities:

 

Changes to Azure AD

Recently Microsoft gave everyone that used ExpressRoute public peering about a 45-day notice that from August 2018 Azure AD authentication and authorisation traffic will no longer be routable via Public peering. This functionality is still available if you use Office 365 over ExpressRoute, simply create a Route Filter and assign the BGP Community “Other Office 365 Services”.

To get access to that BGP Community, its much like any Office 365 service being accessed via ExpressRoute- that will need to have your Microsoft TAM approve the request as the Microsoft stance on using ExpressRoute for Office 365 traffic seesaw has swung again in the “you should really use the internet for Office 365, unless maybe Skype for Business/Teams latency is a problem”- again this is my experience.

 

Summary

  • ExpressRoute public peering has been on the radar to be deprecated for some time now
  • If you create new ExpressRoute circuits in parallel to your legacy ones, don’t expect to have the new ones work the same as legacy
    • I’ve even had the Azure product group “restore service” on a ASM/Classic ExpressRoute circuit that had Public peering, which did not restore service at all
    • We essentially spun up Microsoft peering and added the relevant Azure Route Filter
  • ARM ExpressRoute
    • Microsoft Peering has merged with Public peering so Microsoft peering does everything it did before + Public peering
    • Microsoft Peering requires RouteFilters to be applied to advertise routes from Azure to the customer
      • BGP Community tags are used inside of RouteFilters
    • As of August 1st 2018, ExpressRoute Public peering will no longer advertise Azure AD routes
      • This can be accessed via Microsoft Peering, using a Route Filter and the BGP Community tag of “Other Office 365 services”
    • No changes to Private peering at all – woohoo! (as of the date of writing this blog)
  • ASM/Classic ExpressRoute
    • You can’t provision a Classic ExpressRoute circuit anymore
    • If you have one, you’ve likely been bumped up to ARM, given the ASM portal is deprecated
    • Legacy ExpressRoute circuits that have been in-place since prior to August 1 2017, enjoy it while it lasts!
      • Any changes that you might need may be difficult to arrange- you’ll likely need to change the service to comply to current standards

Enjoy!

Avoiding Cosmos DB Bill Shock with Azure Functions

Cosmos DB is a fantastic database service for many different types of applications. But it can also be quite expensive, especially if you have a number of instances of your database to maintain. For example, in some enterprise development teams you may need to have dev, test, UAT, staging, and production instances of your application and its components. Assuming you’re following best practices and keeping these isolated from each other, that means you’re running at least five Cosmos DB collections. It’s easy for someone to accidentally leave one of these Cosmos DB instances provisioned at a higher throughput than you expect, and before long you’re racking up large bills, especially if the higher throughput is left overnight or over a weekend.

In this post I’ll describe an approach I’ve been using recently to ensure the Cosmos DB collections in my subscriptions aren’t causing costs to escalate. I’ve created an Azure Function that will run on a regular basis. It uses a managed service identity to identify the Cosmos DB accounts throughout my whole Azure subscription, and then it looks at each collection in each account to check that they are set at the expected throughput. If it finds anything over-provisioned, it sends an email so that I can investigate what’s happening. You can run the same function to help you identify over-provisioned collections too.

Step 1: Create Function App

First, we need to set up an Azure Functions app. You can do this in many different ways; for simplicity, we’ll use the Azure Portal for everything here.

Click Create a Resource on the left pane of the portal, and then choose Serverless Function App. Enter the information it prompts for – a globally unique function app name, a subscription, a region, and a resource group – and click Create.

Screen Shot 2018-07-23 at 9.07.43 pm

Step 2: Enable a Managed Service Identity

Once we have our function app ready, we need to give it a managed service identity. This will allow us to connect to our Azure subscription and list the Cosmos DB accounts within it, but without us having to maintain any keys or secrets. For more information on managed service identities, check out my previous post.

Open up the Function Apps blade in the portal, open your app, and click Platform Features, then Managed service identity:

Screen Shot 2018-07-23 at 9.09.51 pm

Switch the feature to On and click Save.

Step 3: Create Authorisation Rules

Now we have an identity for our function, we need to grant it access to the parts of our Azure subscription we want it to examine for us. In my case I’ll grant it the rights over my whole subscription, but you could just give it rights on a single resource group, or even just a single Cosmos DB account. Equally you can give it access across multiple subscriptions and it will look through them all.

Open up the Subscriptions blade and choose the subscription you want it to look over. Click Access Control (IAM):

Screen Shot 2018-07-23 at 9.13.33 pm copy

Click the Add button to create a new role assignment.

The minimum role we need to grant the function app is called Cosmos DB Account Reader Role. This allows the function to discover the Cosmos DB accounts, and to retrieve the read-only keys for those accounts, as described here. The function app can’t use this role to make any changes to the accounts.

Finally, enter the name of your function app, click it, and click Save:

Screen Shot 2018-07-23 at 9.14.46 pm

This will create the role assignment. Your function app is now authorised to enumerate and access Cosmos DB accounts throughout the subscription.

Step 4: Add the Function

Next, we can actually create our function. Go back into the function app and click the + button next to Functions. We’ll choose to create a custom function:

Screen Shot 2018-07-23 at 9.34.58 pm

Then choose a timer trigger:

Screen Shot 2018-07-23 at 9.35.27 pm

Choose C# for the language, and enter the name CosmosChecker. (Feel free to use a name with more panache if you want.) Leave the timer settings alone for now:

Screen Shot 2018-07-23 at 9.36.46 pm

Your function will open up with some placeholder code. We’ll ignore this for now. Click the View files button on the right side of the page, and then click the Add button. Create a file named project.json, and then open it and paste in the following, then click Save:

This will add the necessary package references that we need to find and access our Cosmos DB collections, and then to send alert emails using SendGrid.

Now click on the run.csx file and paste in the following file:

I won’t go through the entire script here, but I have added comments to try to make its purpose a little clearer.

Finally, click on the function.json file and replace the contents with the following:

This will configure the function app with the necessary timer, as well as an output binding to send an email. We’ll discuss most of these settings later, but one important setting to note is the schedule setting. The value I’ve got above means the function will run every hour. You can change it to other values using CRON expressions, such as:

  • Run every day at 9.30am UTC: 0 30 9 * * *
  • Run every four hours: 0 0 */4 * * *
  • Run once a week: 0 0 * * 0

You can decide how frequently you want this to run and replace the schedule with the appropriate value from above.

Step 5: Get a SendGrid Account

We’re using SendGrid to send email alerts. SendGrid has built-in integration with Azure Functions so it’s a good choice, although you’re obviously welcome to switch out for anything else if you’d prefer. You might want an SMS message to be sent via Twilio, or a message to be sent to Slack via the Slack webhook API, for example.

If you don’t already have a SendGrid account you can sign up for a free account on their website. Once you’ve got your account, you’ll need to create an API key and have it ready for the next step.

Step 6: Configure Function App Settings

Click on your function app name and then click on Application settings:

Screen Shot 2018-07-23 at 9.39.34 pm

Scroll down to the Application settings section. We’ll need to enter three settings here:

  1. Setting name: SendGridKey. This should have a value of your SendGrid API key from step 5.
  2. Setting name: AlertToAddress. This should be the email address that you want alerts to be sent to.
  3. Setting name: AlertFromAddress. This should be the email address that you want alerts to be sent from. This can be the same as the ‘to’ address if you want.

Your Application settings section should look something like this:

Screen Shot 2018-07-23 at 9.31.26 pm

Step 7: Run the Function

Now we can run the function! Click on the function name again (CosmosChecker), and then click the Run button. You can expand out the Logs pane at the bottom of the screen if you want to watch it run:

Screen Shot 2018-07-23 at 9.42.36 pm

Depending on how many Cosmos DB accounts and collections you have, it may take a minute or two to complete.

If you’ve got any collections provisioned over 2000 RU/s, you should receive an email telling you this fact:

Screen Shot 2018-07-23 at 9.48.37 pm.png

Configuring Alert Policies

By default, the function is configured to alert whenever it sees a Cosmos DB collection provisioned over 2000 RU/s. However, your situation may be quite different to mine. For example, you may want to be alerted whenever you have any collections provisioned over 1000 RU/s. Or, you may have production applications that should be provisioned up to 100,000 RU/s, but you only want development and test collections provisioned at 2000 RU/s.

You can configure alert policies in two ways.

First, if you have a specific collection that should have a specific policy applied to it – like the production collection I mentioned that should be allowed to go to 100,000 RU/s – then you can create another application setting. Give it the name MaximumThroughput:{account_name}:{database_name}:{collection_name}, and set the value to the limit you want for that collection.

For example, a collection named customers in a database named customerdb in an account named myaccount-prod would have a setting named MaximumThroughput:myaccount-prod:customerdb:customers. The value would be 100000, assuming you wanted the function to check this collection against a limit of 100,000 RU/s.

Second, by default the function has a default quota of 2000 RU/s. You can adjust this to whatever value you want by altering the value on line 17 of the function code file (run.csx).

ARM Template

If you want to deploy this function for yourself, you can also use an ARM template I have prepared. This performs all the steps listed above except step 3, which you still need to do manually.

 

Of course, you are also welcome to adjust the actual logic involved in checking the accounts and collections to suit your own needs. The full code is available on GitHub and you are welcome to take and modify it as much as you like! I hope this helps to avoid some nasty bill shocks.