Enterprise Application platform with Microservices – A Service Fabric perspective

An enterprise application platform can be defined as a suite of products and services that enables development and management of enterprise applications. This platform should be responsible of abstracting complexities related to application development such as diversity of hosting environments, network connectivity, deployment workflows, etc. In a traditional world, applications are monolithic by nature. A monolithic application is composed of many components grouped into multiple tiers bundled together into a single deployable unit. Each tier here can be developed using a specific technology and will have the ability to scale independently. Monolithic application usually persists data in one common data store.

MicroServices - 1

Although a monolithic architecture logically simplifies the application, it introduces many challenges as the number of applications in your enterprise increases. Following are few issues with a monolithic design

  • Scalability – The unit of scale is scoped to a tier. It is not possible to scale bundled within an application tier without scaling the whole tier. This introduces massive resource wastage resulting in increase in operational expense.
  • Reuse and maintenance – The components within an application tier cannot be consumed outside the tier unless exposed as contracts. This forces development teams to replicate code which becomes very difficult to maintain.
  • Updates – As the whole application is one unit of deployment, updating a component will require updating the whole application which may cause downtime thereby affecting the availability of the application.
  • Low deployment density – The compute, storage and network requirements of an application, as a bundled deployable unit may not match the infrastructure capabilities of the hosting machine (VM). This may lead to wastage of shortage of resources.
  • Decentralized management – Due to the redundancy of components across applications, supporting, monitoring and troubleshooting becomes expensive overheads.
  • Data store bottlenecks – If there are multiple components accessing the data store, it becomes the single point of failure. This forces the data store to be highly available.
  • Cloud unsuitable – The hardware dependency of this architecture to ensure availability doesn’t work well with cloud hosting platforms where designing for failure is a core principle.

A solution to this problem is an Application platform based on Microservices architecture. Microservice architecture is a software architecture pattern where applications are composed of small, independent services which can be aggregated using communication channels to achieve an end to end business use case. The services are decoupled from one another in terms of the execution space in which they operate. Each of these services will have the capability to be scaled, tested and maintained separately.

MicroServices - 2

Microsoft Azure Service Fabric

Service Fabric is a distributed application platform that makes it easy to package, deploy, and manage scalable and reliable Microservices. Following are few advantages of a Service Fabric which makes it the ideal platform to build a Microservice based Application Platform

  • Highly scalable – Every service can be scaled without affecting other services. Service Fabric will support scaling based on VM scale sets which means that these services will have the ability to be auto-scales based on CPU consumption, memory usage, etc.
  • Updates – Services can be updated separately and different versions of a service can be co-deployed to support backward compatibility. Service Fabric also supports automatic rollback during updates to ensure consistency of an application deployment.
  • State redundancy – For state full Microservices, the state can be stored alongside compute for a service. If there are multiple instances of a service running, the state will be replicated for every instance. Service Fabric takes care of replicating the state changes through the stores.
  • Centralized management – The service can be centrally managed, monitored and diagnosed outside application boundaries.
  • High density deployment – Service Fabric supports high density deployment on a virtual machine cluster while ensuring even utilization of resources and distribution of work load.
  • Automatic fault tolerance – The cluster manager of Service Fabric ensures failover and resource balancing in case of a hardware failure. This ensures that your services are cloud ready.
  • Heterogeneous hosting platforms – Service Fabric supports hosting your Microservices across Azure, AWS, On premises or any other datacenter. Cluster manager is capable of managing service deployments with instances spanning multiple datacenters at a time. Apart from Windows, Service Fabric also supports Linux as a host operating system for your micro services.
  • Technology agnostic – Services can be written in any programming language and deployed as executables or hosted within containers. Service Fabric also supports a native Java SDK for Java developers.

Programming models

Service Fabric supports the following four programming models for developing your service:

  • Guest Container – Services packaged into Windows or Linux containers managed by Docker.
  • Guest executables – Services can be packaged as guest executables which are arbitrary executables, written in any language.
  • Reliable Services – An application development framework with fully supported application lifecycle management. Reliable services can be used to develop stateful as well as stateless services and supports transactions.
  • Reliable Actors – A virtual actor based development framework with built-in state management and communication management capabilities. Actor programming model is single threaded and is ideal for hyper scale out scenarios (1000s or instances)

More about Service Fabric programming models can be found here

Service Type

A service type in Service Fabric consists of three components

MicroServices - 3

Code package defines an entry point to the service. This can be an executable or a dynamic linked library. Config package specifies the service configuration for your services and the data package holds static resources like images. Each package can be independently versioned. Service fabric supports upgrade of each of these packages separately. Allocation of a service in a cluster and reallocation of a service on failure are responsibilities of Service Fabric cluster manager. For stateful services, Service Fabric also takes care of replicating the state across multiple instances of a service.

Application Type

A Service Fabric application type is composed of one or more service types.

MicroServices - 4

An application type is a declarative template for creating an application. Service fabric uses application types for packaging, deployment and versioning Microservices.

State stores – Reliable collections

Reliable Collections are highly available, scalable and high performance state store which can be used to store states alongside compute for Microservices. The replication of state and persistence of state on secondary storage is taken care of by the Service Fabric framework. A noticeable difference between Reliable Collections and other high-availability state store (such as cache, tables, queues, etc.) is that the state is kept locally in the service hosting instance while also being made highly available by means of replication. Reliable collections also support transactions and are asynchronous by nature while offering strong consistency.

More about reliable collection can be found here


Service Fabric offers extensive health monitoring capabilities with built-in health status for clusters and services and custom app health reporting. Services are continuously monitored for real-time alerting on problems in production. Performance monitoring overheads are diluted with rich performance metrics for actors and services. Service Fabric analytics is capable of providing repair suggestion thereby supporting preventive healing of services. Custom ETW logs can also be captured for guest executables to ensure centralized logging for all your services. Apart from support for Microsoft tools such as Windows Azure Diagnostics, Operational Insights, Windows Event Viewer and Visual studio diagnostics events viewer, Service Fabric also supports easy integration with third party tools like Kibana and Elastic search as monitoring dashboards.

Conclusion and considerations

Microsoft Service Fabric is potential platform capable of hosting enterprise grade Microservices. Following are few considerations to be aware of while using Service Fabric as your Microservice hosting platform.

  • Choosing a programming model is critical for optimizing the management of Microservices hosted on Service Fabric. Reliable services and Reliable actors are more thickly integrated with the Service Fabric cluster manager compared to guest containers and guest executables.
  • The support for containers on Service Fabric is in an evolving stage. While Windows containers and Hyper-V containers are on the release roadmap, Service Fabric only supports Linux containers as of today.
  • The only two native SDKs supported by Service fabric as of today is based on .net and Java.


Cursor-Scoped Event Aggregation Pattern – for high performance aggregation queries

Often, stateful applications executing enterprise scenarios are not just interested in the current state of the system, but also in the events which caused the transition that resulted in the particular state. This requirement has exponentially increased the popularity of the Event Sourcing Pattern.

Event Sourcing Patterns ensure that all changes/events that caused a transition in application state are stored as a sequence of events in an append-only store. These events can then be queried for tracking history or even be used to reconstruct system state in case of an unexpected application crash.

The Event Sourcing design pattern is best suited to decouple commands and queries to improve efficiency of the system. An implicit dimension which is recorded in this pattern along with the events of interest is the ‘time’ when the event occurred.  This post talks about a design pattern which enhances event sourcing pattern making it capable of portraying views based on a time range.


Apart from the requirement of replaying events to recreate state, some systems also have requirements around aggregating information drawn from events which occurred during a specific timeframe.

Running a query against the event source table to retrieve this information can be an expensive operation. The event source table has a tendency to grow quickly if it is designated to capture events related to multiple entity types. In this case the performance of querying this table to retrieve this information will not be very performant.

Apart from the requirement of replaying events to recreate state, some systems also have requirements around aggregating information drawn from events which occurred during a recent time frame. Running a query against the event source table to retrieve this information will be an expensive operation. The event source table also has a tendency to grow very fast if it is designated to capture events related to multiple entity types. In which case the efficiency of querying the event store to retrieve this information will be sub-optimal.

Event Sourcing Pattern


A solution to this problem is to introduce a Cursor-Scoped Event Aggregation Pattern on top of the Event Sourcing Pattern. The cursor here will be associated with a configurable time range which will define the scope of events the query is interested in. The pattern replicates the tuples of relevance (based on the cursor) in an in-memory data dictionary based on the filtering criteria defined in the query. The Aggregation function will collate the properties of interest within this dictionary to produce results for the query.

Cursor-Scoped Event Aggregation Pattern

The following flowchart captures the activities which are performed on the cursor data store when a new event is appended in the store or when a new query is fired against it.



Related Cloud Design Patterns

  • Event Sourcing Pattern – This pattern assumes an event sourcing pattern to be implemented by the system.
  • Index Table Pattern – Index table pattern is an alternative to improve performance around querying event sources.


  • Eventual consistency – Event Sourcing Pattern is usually implemented in scenarios which can support eventual consistency. As we are querying the event source directly rather than the entities, there is a chance of inconsistency between the state of the entities and the result of the query.
  • Immutability – The event store is immutable, the only way to reverse a transaction is to introduce a compensation event. The Aggregator should be capable of handling compensation events.

Usage Scenarios

  • High performance systems – Systems which required real time response on aggregation queries.

Inside Azure – Deployment workflow with Fabric Controller and Red Dog Front End

Abstracting complexities around developing, deploying and maintaining software applications have diminished the importance of understanding underlying architecture. While this may work well for today’s aggressive delivery cycles, at the same time, it impacts the ability of engineers to build an efficient, optimal solution which aligns with the internal architecture of the hosting platform. Architects and engineers should be cognizant of the architecture of the hosting environment to better design a system. The same holds good for Microsoft Azure as a hosting provider.

This post is an attempt to throw light on the workflow around deploying workload on Microsoft Azure, the systems involved in this process and their high-level architecture.


To start with let’s look at the high level architecture of an Azure datacenter. The following diagram illustrates the physical layout of an Azure Quantum 10 V2 datacenter.


Figure 1.0

Parking the physical layers for a later post, we shall focus on last layer termed as ‘Clusters’ to understand the logical architecture of Microsoft Azure datacenters.

Logical Architecture

Clusters are logical group of server racks. A typical cluster will include 20 server racks hosting approximately 1000 servers. Clusters are also known as ‘Stamps’ internally within Microsoft. Every cluster is managed by a Fabric Controller. Fabric Controller, often considered as the brain of the entire Azure ecosystem is deployed on dedicated resources with in every cluster.


Figure 1.1

Fabric Controller is a highly available, stateful, replicated application. Five machines on every Azure Datacenter cluster are dedicated for Fabric Controller deployment. One server out of the five servers acts as the primary and replicates the state to the other four secondary servers at regular intervals. This is to ensure high availability of this application. Fabric controller is an auto healed system. Hence, when a server goes down, one of the active servers will take charge as the primary and spin up another instance of Fabric Controller.

Fabric controller is responsible for managing the Network, Hardware and the Provisioning of servers in an Azure datacenter. If we visualize the whole datacenter as a machine we can map the server (hardware) as the datacenter itself, kernel (of the operating system) as Fabric controller and processes running on the machine as services hosted on the datacenter. The following image illustrates this perspective:


Figure 1.2

Fabric controller controls the allocation of resources (such as compute and storage), both for your deployed custom applications as well as for built-in Azure services. This provides a layer of abstraction to the consumer there by ensuring better security, scalability and reliability.

Fabric controller takes inputs from following two systems:

  • Physical Datacenter deployment manager – The physical layout of the machines on the racks, their IP addresses, routers address, certificates for accessing the router, etc. in XML format.
  • RDFE (discussed later in the post) – The deployment package and configuration.

Following flowchart captures the boot up tasks performed by the Fabric Controller when it instantiates a cluster:


Figure 1.3


The following diagram illustrates a high-level view of the deployment process:


Figure 2.0

Before we understand the deployment workflow, it’s important to familiarize with another software component in the Azure ecosystem which is primarily responsible for triggering a deployment. RDFE (Red Dog Front End), named after the pre-release code name for Microsoft Azure (Red Dog) is a highly available, Azure deployed application which feeds deployment instructions to the Fabric controller. It is responsible for collecting the deployment artefacts, making copies of it, choosing the target cluster for deployment and triggering the deployment process by sending instructions to the Fabric Controller. The following flowchart details the workflow handled by RDFE:


Figure 2.1

A fact to keep in mind is that Azure portal is stateless and so are the management APIs. During a deployment, the uploaded artefacts are passed on to RDFE which stores these artefacts in a staging store (RDFE store). RDFE chooses five clusters for every deployment. If a cluster fails to complete the deployment operation, RDFE picks another cluster from the other four and restarts the deployment operation.

Once a cluster is picked, the deployment workflow is handed over to the Fabric controller. Fabric controller performs the following tasks to provision the required hardware resources as per the application/service configuration.


Figure 2.2

Things to remember

Following are few key learnings from my understanding of Azure Architecture:

  • Keep your workload small – As discussed in the Deployment section (Figure 2.1, 2.2) , the uploaded workload gets copied multiple times before its deployed on a virtual machine. While deploying an application package, for better efficiency, it is recommended that the code is separated from the other dependent artefacts and only code is packaged as the workload. The dependent artefacts can be separately uploaded into Azure storage or any other persistent store.
  • Avoid load balancer whenever you canFigure 2.2 illustrates steps to update the Azure load balancer post deployment. By default all traffic to a node (server) is routed through the load balancer. Every node(server) is associated with a virtual IP address when it is deployed. Using this to directly communicate with the server reduces one hop in the network route. However, this is not advisable under all circumstances. This may be well suited if the Server is hosting a singleton workload.
  • Deploy the largest resource first – The choice of clusters to deploy is made my RDFE in the early stages of deployment as illustrated in Figure 2.1. Not all clusters have all machine configurations available. So if you want co-allocation of servers on a cluster choose to deploy the largest resource first. This will force RDFE to pick a more capable cluster to deploy the workload. All subsequent workloads will be deployed on the same cluster if they are placed under the one affinity group.
  • Create syspreped master image VHDs with drivers persisted – Azure optimizes the ISO download operation (on a node) by referring to a pre-fetched cache as shown in Figure 2.2.  While using custom images for Virtual Machines, it is advisable to persist drivers so that Azure can utilize pre-fetched driver caching to optimize your deployment

WebHook your WebJob – With Visual Studio Team Services integration example

WebHooks, also known as ‘HTTP(S) call backs’ are becoming very popular for reporting asynchronous events to trigger business workflows. The latest release of Microsoft Azure WebJobs can now be triggered using WebHooks. In this post I will cover the configuration of a WebJob to use a WebHook as a trigger using a sample scenario of integrating a WebJob with Visual Studio Team Services (VSTS) to explain the workflow.

Support for WebHooks is packaged as an extension to WebJobs and is currently in a pre-release state. You must install the following NuGet package to start using WebHooks with WebJobs:

  • Package name – Microsoft.Azure.WebJobs.Extensions.WebHooks
  • Verion – 1.0.0-beta4
  • Author – Microsoft.

To install this using the NuGet Package Manager, make sure that you have checked the Include prerelease as shown below.

Package Manager with prereleases

As I said previously, the WebHook extension is currently prerelease as Microsoft is adding capabilities to it, one of which is to enrich its security model. Presently, WebHooks only support basic authentication using the Azure App Services’ publishing credentials.

Using the WebHook trigger in WebJob

The first step after installing the extension is to update the WebJob’s startup code to load the extension

Now we can write a function which can be triggered by a WebHook

In the above code, I have a private method (SendEmail) which is called from the ReportBug method to send an email (using SendGrid) when the WebHook fires the function. The code uses an API Key to connect to the SendGrid service. You can provision a free SendGrid service through Microsoft Azure Portal.

SendGrid configuration

The WebJob can now be published in a Microsoft Azure App Service container.

Configuring Visual Studio Online

To register this WebHook in Visual Studio Team Services, use the following steps:

  1. Login to the visual studio online web portal
  2. Navigate to settings.

VSTS settings

  1. Click on the ‘Service Hooks’ tab and add a new service hook

VSTS Service Hooks

  1. Pick WebHook service and click next

VSTS Service Hook Selection

  1. Pick the ‘Work item created’ trigger. We can leave the filters open for this example. Click next

VSTS Service Hook Trigger

  1. Construct your WebHook URL.

The WebHook URL for the WebJob hosted on Azure within an App Service should be in the following format:


  • Site – App Service’s  SCM site (e.g. yoursite.scm.azurewebsites.net)
  • Job – The name of your WebJob
  • Path – Path to the WebHook function. This should be of the format ClassName/MethodName

In our case this would look like


Extract the credentials from your App Services Publish Profile.

  1. Fill the URL, Basic Authentication username and password and then click on ‘Test’.

VSTS Service Hook Test

  1. If all configurations are right and the WebJob is running, you will see a pop-up similar to that below.

VSTS Service Hook Test

  1. Click on finish

Testing the integration

Creating any new work item in Visual Studio Team Services should now trigger an email with a JSON describing the work item in the body of the email.

Sample email message

You should also be able to see the trace messages in the WebJob Dashboard.

Key learnings

  • WebHook triggers are best suited for integrating heterogeneous workflows
  • This feature is still in preview. This should be considered before using it for any production solutions
  • Only Basic Authentication is supported for WebHooks today, but this is expected to change in future releases. As we are sharing the publishing credentials with the calling application, it’s important to ensure that the calling application is trusted.

Highly available WordPress deployment on Azure

WordPress is the leading content management system today owning more than 50% of the market share. WordPress on Microsoft Azure is becoming a very popular offering with the ability to host WordPress as an Azure WebApp. While Microsoft has made the process of spinning up a WordPress site very easy with built in gallery images, running business critical applications on a cloud platform also introduces challenges in terms on availability and scalability.

WordPress Architecture

A typical WordPress deployment consists of the following two tiers:

  • Web Frontend – PHP web site
  • Backend Data store
    • Relational data store – Hierarchical entity store
    • Object data store – Used to store uploaded images and other artefacts

In order to guarantee high availability and scalability, we need to ensure that each of these service tiers are decoupled and can be separately managed.

 Web frontend

Windows Azure supports WordPress to be deployed as an Azure WebApp. Azure WebApp is a part of the Azure App Services offering, a fully managed service which can host mission critical web applications.

Azure WebApps natively supports scaling which can be achieved by increasing the number of instances of the hosting WebApp. The native Azure load balancer takes care of distributing the traffic amongst these instances in a ‘round robin’ manner. WebApps also support schedule-driven and automatic scaling.


Azure Portal can be used to configure scaling rules for the WebApp.

Azure WebApps offers an uptime of 99.95% for basic, standard and premium tiers even with a single instance of deployment . Azure Load Balancer takes care of managing failover amongst instances within a region. To achieve higher availability, the application front end can be deployed across different geographical regions and Azure Traffic manager can be employed to handle load balancing, network performance optimization and failover.


Backend Store

WordPress back consists of two data stores. One, a relational data store which is used to store WordPress entities along with their hierarchies and second, an object store used to persist artefacts.

Azure Gallery hosts a prebaked image for Scalable WordPress, which lets you configure a scalable MySQL backend as the relational store and uses Azure Storage as the object store for your WordPress deployment. Scalable WordPress uses Azure Blob store to host uploaded artefacts.  This works well for most of the scenarios.


It is important to understand that by using a MySQL backend for your WordPress site you are engaging with a third party database-as-a-service provider (which in this case is Clear DB). This means that the availability SLA associated with your WordPress backend is not provided by Microsoft. Always check the SLAs associated with your chosen pricing tier with the third party provider.

An alternative is to use Project Nami, which offers a WordPress image configured to run against Azure SQL Database as the back end. You can deploy the WordPress site with a Project Nami image either from the Azure gallery or from project’s website. Nami supports WordPress version 4.4.2 with a fully configurable Azure SQL Database backend. Once deployed, the WordPress instance can be configured to use Azure Storage as the object store by employing Windows Azure Storage for WordPress plugin.


Using SQL Azure and Azure Storage as backend for your WordPress site has the following key advantages

  • Eligibility for Microsoft SLA – A minimum uptime of 99.9% is guaranteed on Azure Storage and an uptime of 99.99% is guaranteed on Basic, Standard, or Premium tiers of Microsoft Azure SQL Database.
  • Easily manageable – Azure SQL databased can be provisioned and managed using Azure Resource Manager templates or through the portal.
  • On-demand scaling – Azure SQL database supports on-demand scaling to meet changing business demands and traffic.
  • More secure – Azure SQL databases offers better support around auditing and compliance apart from offering highly secure connections.


To conclude, you can deploy a highly available WordPress site in Microsoft Azure by ensuring that the front end and backend tiers are fault tolerant and designed for failure. The deployment strategy can be influenced by the following factors:

  • Choice of technology for the backend – Microsoft (SQL Azure)/Non-Microsoft (MySQL)
  • SLA requirements
  • Logging and Auditing requirements
  • Scaling, security and manageability requirements


Project Nami – http://projectnami.org/

Scalable WordPress – https://azure.microsoft.com/en-us/marketplace/partners/wordpress/scalablewordpress/


Azure WebJob logs demystified

Asynchronous jobs are usually hard to troubleshoot due to the very nature of its execution. This post talks about how we can monitor and trouble shoot Azure WebJobs both during development and when it is deployed on an Azure Web App. The key is to understand the layout of the logs the WebJob runtime creates during its execution.

WebJob storage accounts

To enable logging, WebJob needs two Azure storage account connection strings to be configured:

  • AzureWebJobsDashboard
  • AzureWebJobsStorage


This storage account is primarily used by Azure WebJob SDK to store logs from the WebJobs Dashboard. This connection string is optional and is required only if you are planning to use the dashboard for monitoring WebJobs.

The WebJob runtime creates two containers under this storage account with the names ‘azure-webjobs-dashboard’ and ‘azure-jobs-host-archive’. The azure-webJobs-dashboard container is used by the WebJob dashboard to store host and execution endpoint (function) details. Azure-jobs-host-archive is used as an archive for execution logs.


AzureWebJobsStorage should point to a storage account which will be primarily used for logging. WebJob runtime creates two containers in this storage account with the names ‘azure-jobs-host-output’ and ‘azure-webjobs-host’. If you point AzureWebJobsDashboard and AzureWebJobsStorage  at two different storage accounts, you will notice that these two containers are duplicated in both the storage accounts.

azure-webjobs-host container in-turn hosts three directories:

  • Heartbeats – Containing 0 byte blogs for every heartbeat check performed on the service.
  • Ids – Containing the directory with a single blog holding a unique identifier for this service.
  • Output-logs – Hosts the output of the explicit logs for each run. Explicit logs being logs introduced by WebJob developers within the execution code.

azure-jobs-host-output container is the key for troubleshooting web jobs. This container hosts logs created by the WebJob runtime during initialization and termination of every execution.

Understanding azure-jobs-host-output (Storage Container)

To understand output logs, let’s consider the following code which defines a task to be executed by the WebJob.

When the WebJob is executed (either on a dev environment from Visual studio or by hosting within a WebApp), an output log is created in the azure-jobs-host-output container in the following format.

WebJob start log

Note: The maroon text below are comments explaining the JSON elements and are not part of the actual log file.

Once the execution is complete, a blob file in the following format is created by the WebJob runtime.

Using WebJob Dashboard

WebJob dashboard provides a detailed, user friendly interface to manage your WebJobs hosted on Microsoft Azure. The dashboard will be enabled once the Azure WebApp hosting the WebJob is deployed and will be accessible on the following path

https://<WebApp Name>.scm.azurewebsites.net/azurejobs/#/jobs

The dashboard can also be reached via Azure Portal.


On the homes screen, dashboard will list the web jobs hosted within the selected WebApp and their current execution status.


On clicking on a specific WebJob, dashboard will navigate you to a page displaying the list of executions for that WebJob along with their status.


Each execution can be further expanded to see the explicit logging.


For most cases, the logs displayed on the portal should supply sufficient information for us to troubleshoot the WebJob. Exploring the WebJob logs in Azure Storage Accounts can be used for deeper investigations.


Azure WebJobs SDK – https://github.com/Azure/azure-webjobs-sdk

MSDN Documentation – https://azure.microsoft.com/en-us/documentation/articles/app-service-webjobs-readme/

Cloud Cushioning using Azure Queues

The distributed world

The cloud revolution has revived the importance of distributed computing in today’s enterprise market with the distribution of compute and storage workloads across multiple decoupled resources helping corporates optimise their capital and operational expenditure.

While there are benefits of moving to the cloud, it’s important to understand the ground rules of the cloud platform. Running your business critical services on commodity hardware with a service SLA of three nines (99.9) against five nines (99.999) does call for some precautions. A key mitigation is to adhere to the recommendations provided by the cloud platform provider for application hosting.

The Microsoft published article on cloud development recommendations is a the perfect cheat sheet.

Amongst many design patterns and recommendations for designing and developing applications for the cloud, designing the right methodology for enabling asynchronous communication between software services ultimately plays a key role in determining the reliability, scalability and efficiency of your application.

Why employ a Queue?

Queuing is as an effective solution for enabling asynchronous communications between software services. The following are few benefits of employing a queuing model:

  1. Minimal dependency on service availability – As queues act as a buffer between software components, the availability of a service will not impact another as they can function in a disconnected fashion.
  2. High reliability – Queues uses transactions to manage the messages stored in them. In case of a failure, the transaction can be rolled back to recover the message.
  3. Load balancing – Queues can be used for load balancing work between software services. Microsoft recommends a Queue based load leveling pattern as an ideal implementation of this.

Microsoft Azure provides two queuing solutions which can be used to enable asynchronous communication between software services:

  • Azure Storage Queues – an early Azure feature which is a part of Azure storage service offer REST based reliable persistent messaging capability.
  • Azure Service Bus Queues – introduced as a part of Azure Service Bus services to support additional features such as publish/subscribe and topics.

Picking the right queuing technology plays a significant role in the efficiency of a distributed cloud application. In the rest of this post I will cover a few important factors you should consider when choosing one.

What is the size of messages being transferred?

The maximum message size supported by Azure Storage Queues is 64KB while Azure Service Bus Queues support messages up to 256KB. This becomes an important factor especially when the message format is padded (such as XML). An ideal pattern to use for transferring larger chunks of data is to use Azure Storage Blobs as a transient store. The data can be stored as a blob and the link to the blob can be communicated to the consuming service using queues.

Does your ‘message consuming service’ go offline?

This is mostly applicable for batch processing systems which are designed to be dormant/offline periodically. In such a scenario the maximum size of the queue becomes an important factor to consider when choosing a queuing technology. Azure Storage Queues can grow to a maximum size of 200TB while Azure Service Bus Queues can only hold a maximum 80GB of data.

Another factor which impacts the choice of technology is the message expiration duration. In case of batch processing systems it is likely that the system only consumes messages once every few days or weeks. The maximum message expiry period for Azure Storage Queues is 7 days after which the messages cannot be recovered. In case of Azure Service Bus Queues the message expiry duration is unlimited.

Does the order of messages matter?

Although all queues are expected to follow FIFO (first in first out) ordering, it is not guaranteed in the case of Azure Storage Queues. Azure Service Bus Queues, however, guarantees FIFO ordering of messages at all times.

Does your messaging infrastructure require auditing?

Server side logs for operations on the queues is only supported on Azure Storage Queues. A custom implementation is required to capture queuing events if Azure Service Bus Queues are used.

What is the preferred programming model for your applications?

The messages from a queue can be consumed by two methods. A push (publish/subscribe) or a pull (polling) action. Azure Service Bus Queues supports both push and pull models to consume messages while Azure Storage Queues support only a pull model.

Does your application require features like dead letter handling, grouping, scheduling, forwarding or support for transactions?

Azure Service Bus Queues supports advanced features such as dead letter queues, dead letter events, message grouping, message forwarding, duplicate detection, at most once delivery and transactions. These features are not supported by Azure Storage Queues.

Queue design patterns

Here are a few useful design patterns which can be used to leverage the potential of Azure Queues in a distributed application hosted on cloud

Detailed comparison

The following table compares the features of Azure Storage Queues, Azure Service Bus Queues and Amazon Simple Queuing Service (SQS) in detail.

Features Azure Service Bus Queues Azure Storage Queues
API support Yes Yes
PowerShell command lets Support Yes Yes
Local (Australia) availability Yes Yes
Encryption No No
Authentication Symmetric key Symmetric key
Access control RBAC via ACS Delegated access via SAS tokens
Auditing No Yes
Identity provider federation Yes No
Max no: Queues per account 10,000 (per service namespace, can be increased) Unlimited
Max Queue size 1 GB to 80 GB 200 TB
Max message size 256 KB 64 KB
Max message expiration duration Unlimited 7 days
Max concurrent connections Unlimited Unlimited
Max no: records returned per call 5000
Poison Messages
Dead letter handling Yes No
Dead letter events Yes No
Consumption patterns
One-way messaging Yes Yes
Request response Yes Yes
Broadcast messaging Yes No
Publish-Subscribe Yes No
Batch processing
Message grouping Yes No
Message scheduling Yes No
Transaction support Yes No
Assured FIFO Yes No
Delivery guarantee At-Least-Once


Receive behaviour Blocking with/without timeout


Receive Mode Peek & Lease

Receive & Delete

Peek & Lease
Lease/Lock duration 60 seconds (default) 30 seconds (default)
Auto forwarding Yes No
Duplicate detection Yes No
Peek message Yes Yes
Server side logs No Yes
Storage metrics Yes Yes
State management Yes No
Purge queue No Yes
Management Protocol REST over HTTPS REST over HTTP/HTTPS
Runtime Protocol REST over HTTPS REST over HTTP/HTTPS
.Net managed API Yes Yes
Native C++ API No Yes
Java API Yes Yes
Node.js API Yes Yes
Queue naming rules Yes Yes
Maximum throughput Up to 2,000 messages per second

(based on benchmark with 1 KB messages)

Up to 2,000 messages per second

(based on benchmark with 1 KB messages)

Average latency 20-25 ms 10 ms
Throttling behaviour Reject with exception/HTTP 503 Reject with HTTP 503

Useful links

Azure Queues and Service Bus Queues – compared and contrasted.

Overview of Service Bus Messaging Patterns.

Background business – Azure Worker role Vs Web Job

Importance of asynchronous work loads

Background processes play a key role in enabling distributed asynchronous computing. Background tasks have been used in the past to handle secondary workloads like logging, monitoring, scheduling and notifications, but today’s systems use background processing to improve user experience by decoupling User Interfaces from heavier tasks. Microsoft Azure provides two ways to host background tasks:

  • Worker Roles (PaaS)
  • Web Jobs (API apps. SaaS)

Worker Roles are dedicated virtual machines which can be pictured as an executable unit of work, unlike Web Jobs. Making a choice between these two technologies to host your background processes is an important design task.

Key differences

The key differences between Worker Roles and Web Jobs are shown below.

Worker Role Web Job
Hosting Self-Hosted – Hosted on a dedicated virtual machine. Web App hosted – hosted within a Web App container.
Coupling Decoupled from frontends and middle tiers. Coupled with Web App which is possibly hosting a web front end or middle tier.
Scalability Independently scalable. Scalable along with the Web App hosting it.
Remote Access Supports remoting into the host VM. Does not support remoting.
Deployment Complicated deployment. Simple deployment.
Configurability High. Low.
Triggers All triggers have to be programmatically introduced. Supports on-demand, scheduled and continuous triggers.
Management Logging and diagnostics need to be coded in Natively support detailed logging and diagnostics
Debugging Difficult to attach Visual Studio debugger. Easily attachable to Visual Studio debugger.
Pricing Comparatively more expensive. Comparatively cheaper.
Excepting handling Unexpected shutdowns have to be programmatically handled. Supports graceful shutdown.
Tenancy Natively single-tenant. Supports shared (multi-tenant) deployment.

Making your selection

Web Jobs are best suited for running lightweight tasks which require minimum environment customisation. Web Jobs supports Windows executables (cmd, bat, exe), PowerShell, Linux Shell Script, PHP, Python, JavaScript or Java files to be executed as background tasks.

Natively, Web Jobs support three types of triggers:

  • On-Demand – Triggered externally. The Web Job SDK supports listening to Azure Storage tables, queues, blobs and Azure Service Bus queues for triggering a Job. An on-demand Job can also be manually triggered from the portal.
  • Scheduled – Triggered based on a configured schedule. The scheduler used to trigger the job is decoupled from the Web Job itself.
  • Continuous – Runs continuously. Starts when the Web Job starts. An endless loop needs to be explicitly written to keep the Job running infinitely.

Deploying a Web Job requires minimal plumbing and they are also easily managed.

The following scenarios are where Web Jobs can be a perfect fit:

  • Polling RSS feeds
  • Sending SMS notifications
  • Asynchronous logging
  • Archiving.

Worker Roles on the other hand are meant for heavier tasks which require a higher level of customisation. These are best suited for executing jobs which have dependencies on the Windows registry, environment variables or similar Windows infrastructure services.

A key advantage of using Worker Roles is the ability to execute start up tasks. Startup tasks can be used to prepare the execution environment to suit specific requirements. Worker Roles are hosted on a standalone virtual machine which offers more control to the operating environment. Because of this, a Worker Role may be a good choice where there is a requirement to remote in to the host virtual machine. However, making any change to the application image deployed on a worker role using remote desktop is not advisable as the change will not persist after role recycling. The deployment of a Worker Role takes more time compared to a Web Job.

Following are the scenarios where a Worker Role may be a better fit than a Web Job:

  • Locked down environment which requires strong security boundaries to be set up before executing the task. Banks and other financial institutions usually have such requirements.
  • Legacy workloads using COM components which require registration during start up.
  • Completely decoupled task layer where the background tasks need to be scaled independently.

I hope you find this reference useful – feel free to share your own experiences in the comments section below.