Loading and Querying Data in Azure Table Storage using PowerShell

As part of both a side project and a work project I recently had a couple of larger datasets that I needed to put into a database and be able to search them. I had previously used Azure Blob Storage but hadn’t done too much with Azure Table Storage. Naturally I needed to use PowerShell to perform this and I quickly found out that the AzureRM PowerShell Module could do the basics, but it wasn’t going to scale to the size of the datasets I had. Some trial and effort later I got it to do want I needed and then using the Azure Table Service REST API I was able to easily query the dataset.

Note: Initially I performed an initial load of one of the datasets (~35k rows), a row at a time which took close to 5 hours. Once I got Batch Operations for dataset insertion into Table Storage working I got that down to ~4 minutes. This post details how I did it.

Prerequisites

You will need;

  • the Azure Storage Explorer which you can get from here 
  • An Azure Storage Account
    • you can create one though many different methods such as the Azure Portal UI, Azure Portal CLI, Azure CLI, PowerShell …..
  • PowerShell 5.1 or later and the AzureRM PowerShell Module

Creating an Azure Table Storage Table

Using the Azure Storage Explorer, authenticate to Azure and navigate to your Storage Account. Expand the Storage Account, select Tables and right-click and select Create Table. Give the Table a name and hit enter. Whilst you can also create the Table via the PowerShell AzureRM Module or the Azure CLI, it is also super quick and easy using the Azure Storage Explorer which will be used later to verify the loaded dataset.

Using the Azure Storage Explorer I created a Table named NICVendors.

Create Azure Table Service Table.PNG

Loading Data into Azure Table Service

The example data I will use here is the dataset from a post last year for MAC Address Vendors lookup. Rather than exporting to XML I will load it into Azure Table Storage. The following script will obtain the Vendors list from here and save to your local disk. This will provide ~26k entries and is a good test for loading into Azure Table Service.

Update Line 3 for where you want to output the file too.

With the dataset in memory we can parse it and insert each row into the table. The quickest method is to batch the inserts. The maximum number of rows allowed in a batch is 100. Each row must also be using the same Primary Key.

Update:

  • Line 2 for your Azure Subscription
  • Line 3 for the Resource Groups your Storage Account is located in
  • Line 4 for the name of your Storage Account
  • Line 5 for the name of the Table you created
  • Line 6 for the name of the Partition for the dataset

With my Storage Account being in Central US and myself in Sydney Australia loading the ~26k entries took 4 mins 27 seconds to insert.

Azure Table Storage Rows Inserted.PNG

Querying Azure Table Service using the RestAPI and PowerShell

To then query the Table entries to find results for a Vendor the following script can be used. Change;

  • Line 2 for your Storage Account name
  • Line 3 for your Storage Account Key (which you obtain from the Azure Portal for your Storage Account)
  • Line 4 for your Table name
  • Line 20 for the Vendor to query for

Executing the script to query for Dell Inc. returns 113 entries. The graphic below shows one.

Search Azure Table Storage Response.PNG

Summary

Using the AzureRM PowerShell Module and the TableBatchOperation class from the Microsoft.WindowsAzure.Storage.dll we are able to batch the record inserts into 100 row batches.

Using the Azure Table Service REST API we are able to quickly search the Table for the records we are looking for.

A Voice Assistant for Microsoft Identity Manager

This is the third and final post in my series around using your voice to query/search Microsoft Identity Manager or as I’m now calling it, the Voice Assistant for Microsoft Identity Manager.
The two previous posts in this series detail some of my steps and processes in developing and fleshing out this concept. The first post detailed the majority of the base functionality whilst the second post detailed the auditing and reporting aspects into Table Storage and Power BI.
My final architecture is depicted below.
Identity Manager integration with Cognitive Services and IoT Hub 4x3
I’ve put together more of an overview in a presentation format using GitPitch you can checkout here.
The why and how of the Voice Assistant for Microsoft Identity Manager
If you’re interested in building the solution checkout the Github Repo here which includes the Respeaker Python Script, Azure Function etc.
Let me know how you go @darrenjrobinson

Using your Voice to Search Microsoft Identity Manager – Part 2

Introduction

Last month I wrote this post that detailed using your voice to search/query Microsoft Identity Manager. That post demonstrated a working solution (GitHub repository coming next month) but was still incomplete if it was to be used in production within an Enterprise. I hinted then that there were additional enhancements I was looking to make. One is an Auditing/Reporting aspect and that is what I cover in this post.

Overview

The one element of the solution that has visibility of each search scenario is the IoT Device. As a potential future enhancement this could also be a Bot. For each request I wanted to log/audit;

  • Device the query was initiated from (it is possible to have many IoT devices; physical or bot leveraging this function)
  • The query
  • The response
  • Date and Time of the event
  • User the query targeted

To achieve this my solution is to;

  • On my IoT Device the query, target user and date/time is held during the query event
  • At the completion of the query the response along with the earlier information is sent to the IoT Hub using the IoT Hub REST API
  • The event is consumed from the IoT Hub by an Azure Event Hub
  • The message containing the information is processed by Stream Analytics and put into Azure Table Storage and Power BI.

Azure Table Storage provides the logging/auditing trail of what requests have been made and the responses.  Power BI provides the reporting aspect. These two services provide visibility into what requests have been made, against who, when etc. The graphic below shows this in the bottom portion of the image.
Auditing Reporting Searching MIM with Speech.png

Sending IoT Device Events to IoT Hub

I covered this piece in a previous post here in PowerShell. I converted it from PowerShell to Python to run on my device. In PowerShell though for initial end-to-end testing when developing the solution the body of the message being sent and sending it looks like this;

[string]$datetime = get-date
$datetime = $datetime.Replace("/","-")
$body = @{
 deviceId = $deviceID
 messageId = $datetime
 messageString = "$($deviceID)-to-Cloud-$($datetime)"
 MIMQuery = "Does the user Jerry Seinfeld have an Active Directory Account"
 MIMResponse = "Yes. Their LoginID is jerry.seinfeld"
 User = "Jerry Seinfeld"
}
$body = $body | ConvertTo-Json
Invoke-RestMethod -Uri $iotHubRestURI -Headers $Headers -Method Post -Body $body

Event Hub and IoT Hub Configuration

First I created an Event Hub. Then on my IoT Hub I added an Event Subscription and pointed it to my Event Hub.
IoTHub Event Hub.PNG

Streaming Analytics

I then created a Stream Analytics Job. I configured two Inputs. One each from my IoT Hub and from my Event Hub.
Stream Analytics Inputs.PNG
I then created two Outputs. One for Table Storage for which I used an existing Storage Group for my solution, and the other for Power BI using an existing Workspace but creating a new Dataset. For the Table storage I specified deviceId for Partition key and messageId for Row key.
Stream Analytics Outputs.PNG
Finally as I’m keeping all the data simple in what I’m sending, my query is basically copying from the Inputs to the Outputs. One is to get the events to Table Storage and the other to get it to Power BI. Therefore the query looks like this.
Stream Analytics Query.PNG

Events in Table Storage

After sending through some events I could see rows being added to Table Storage. When I added an additional column to the data the schema-less Table Storage obliged and dynamically added another column to the table.
Table Storage.PNG
A full record looks like this.
Full Record.PNG

Events in Power BI

Just like in Table Storage, in Power BI I could see the dataset and the table with the event data. I could create a report with some nice visuals just as you would with any other dataset. When I added an additional field to the event being sent from the IoT Device it magically showed up in the Power BI Dataset Table.
PowerBI.PNG

Summary

Using the Azure IoT Hub REST API I can easily send information from my IoT Device and then have it processed through Stream Analytics into Table Storage and Power BI. Instant auditing and reporting functionality.
Let me know what you think on twitter @darrenjrobinson

Azure Table Storage little gem – InsertOrMerge

This blog describes the usage of the InsertOrMerge operation for Azure Table Storage.

Each entity in Table Storage is defined by the PartitionKey/RowKey combination. InsertOrMerge will insert the entity if it doesn’t exist and, if it exists, it would merge the properties of updated entity with the existing one. For more details, see the Azure Storage blog.

When comparing with the existing table schema, not all properties are required to be specified for this operation. InsertOrMerge only merges the specified properties; the rest of them are ignored. As a result, this feature is very convenient — for instance, when aggregating dashboard data. There might be a background process, which simultaneously assembles various pieces of dashboard data from multiple sources. There is an obvious race condition, which InsertOrMerge solves straight forward.

To demonstrate this, let’s define the following entity that we would like to store in Table Storage.

public class Dashboard : TableEntity
{
    public string AppleType { get; set; }
    public int ApplesCount { get; set; }
    public string OrangeType { get; set; }
    public int OrangesCount { get; set; }
}

Now, let’s define two other entities that contain only some of the existing properties of the Dashboard entity:

public class DashboardApple : TableEntity
{
    public string AppleType { get; set; }
    public int ApplesCount { get; set; }
}

public class DashboardOrange: TableEntity
{
    public string OrangeType { get; set; }
    public int OrangesCount { get; set; }
}

The following code saves the two entities defined above using the InsertOrMerge operation and then retrieves the full entity.

public async Task<Dashboard> Save(
        DashboardApple dashboardApple,
        DashboardOrange dashboardOrange)
{
    var storageAccount = ...
    var tableClient = storageAccount.CreateCloudTableClient();
    table = tableClient.GetTableReference("Dashboard");
    table.CreateIfNotExists();

    var t1 = table.ExecuteAsync(TableOperation.InsertOrMerge(dashboardApple));
    var t2 = table.ExecuteAsync(TableOperation.InsertOrMerge(dashboardOrange));

    await Task.WhenAll(t1, t2);

    var tableResult = await table.ExecuteAsync(
            TableOperation.Retrieve<Dashboard>(partitionKey, rowKey));

    return (Dashboard)tableResult.Result;
}

To test the above code the two entities with an identical PartitionKey/RowKey must be created:

var dashboardApple = new DashboardApple
{
    PartitionKey = "myPartition",
    RowKey = "myRow",
    AppleType = "GoldenDelicious",
    ApplesCount = 5
};

var dashboardOrange = new DashboardOrange
{
    PartitionKey = "myPartition",
    RowKey = "myRow",
    OrangeType = "Naval",
    OrangesCount = 10
};

var dashboard = await Save(dashboardApple, dashboardOrange);

This results in a single entity in the table with the following properties:

{
    PartitionKey = "myPartition",
    RowKey = "myRow",
    AppleType = "GoldenDelicious",
    ApplesCount = 5,
    OrangeType = "Naval",
    OrangesCount = 10
};

The data has been aggregated.

Follow Us!

Kloud Solutions Blog - Follow Us!