Microsoft’s integration stack has slowly matured over the past years, and we’re on the verge of finally breaking away from BizTalk Server, or are we? In this article I’m going to explore Azure Data Factory (ADF). Rather than showing the usual out of the box demo I’m going to demonstrate a real-world scenario that I recently encountered at one of Kloud’s customers.
ADF is a very easy to use and cost-effective solution for simple integration scenarios that can be best described as ETL in the ‘old world’. ADF can run at large scale, and has a series of connectors to load data from a data source, apply a simple mapping and load the transformed data into a target destination.
ADF is limited in terms of standard connectors, and (currently) has no functionality to send data to HTTP/RESTful endpoints. Data can be sourced from HTTP endpoints, but in this case, we’re going to read data from a SQL server and write it to a HTTP endpoint.
Unfortunately ADF tooling isn’t available in VS2017 yet, but you can download the Microsoft Azure DataFactory Tools for Visual Studio 2015 here. Next we’ll use the extremely useful 3rd party library ‘Azure.DataFactory.LocalEnvironment’ that can be found on GitHub. This library allows you to debug ADF projects locally, and eases deployment by generating ARM templates. The easiest way to get started is to open the sample solution, and modify accordingly.
You’ll also need to setup an Azure Batch account and storage account according to Microsoft documentation. Azure Batch is running your execution host engine, which effectively runs your custom activities on one or more VMs in a pool of nodes. The storage account will be used to deploy your custom activity, and is also used for ADF logging purposes. We’ll also create a SQL Azure AdventureWorksLT database to read some data from.
Using the VS templates we’ll create the following artefacts:
- AzureSqlLinkedService (AzureSqlLinkedService1.json)
This is the linked service that connects the source with the pipeline, and contains the connection string to connect to our AdventureWorksLT database.
- WebLinkedService (WebLinkedService1.json)
This is the linked service that connects to the target pipeline. ADF doesn’t support this type as an output service, so we only use it to refer to from our HTTP table so it passes schema validation.
- AzureSqlTableLocation (AzureSqlTableLocation1.json)
This contains the table definition of the Azure SQL source table.
- HttpTableLocation (HttpTableLocation1.json)
The tooling doesn’t contain a specific template for Http tables, but we can manually tweak any table template to represent our target (JSON) structure.
Furthermore, we’ll adjust the DataDownloaderSamplePipeline.json to use the input and output tables that are defined above. We’ll also set our schedule and add a custom property to define a column mapping that allows us to map between input columns and output fields.
The grunt of the solution is performed in the DataDownloaderActivity class, where custom .NET code ‘wires together’ the input and output data sources and performs the actual copying of data. The class uses a SqlDataReader to read records, and copies them in chunks as JSON to our target HTTP service. For demonstration purposes I am using the Request Bin service to verify that the output data made its way to the target destination.
We can deploy our solution via PowerShell, or the Visual Studio 2015 tooling if preferred:
After deployment we can see the data factory appearing in the portal, and use the monitoring feature to see our copy tasks spinning up according to the defined schedule:
In the Request Bin that I created I can see the output batches appearing one at a time:
As you might notice it’s not all that straightforward to compose and deploy a custom activity, and having to rely on Azure Batch can incur significant cost unless you adopt the right auto scaling strategy. Although the solution requires us to write code and implement our connectivity logic ourselves, we are able to leverage some nice platform features as a reliable execution host, retry logic, scaling, logging and monitoring that are all accessible through the Azure portal.
The complete source code can be found here. The below gists show the various ADF artefacts and the custom .NET activity.
The custom activity C# code: