Cosmos DB is a NoSQL database provided as part of Microsoft’s Azure platform. Designed for very high performance and scalability, Cosmos DB is rapidly becoming one of the default data storage options I recommend for new green-field applications and microservices. It is a fairly opinionated database, with some guidelines that you need to follow to take full advantage of its scalability and performance, but it also provides a number of features to enable sophisticated and powerful applications to be built on top of its engine.

One such feature is its server-side programmability model. Cosmos DB allows for stored procedures, triggers, and user-defined functions to run within its database engine. Interestingly, these are written using JavaScript and uploaded to the Cosmos DB collection in which they will run. Server-side programming gives a lot of extra power to a Cosmos DB-based application, including the ability to run transactions across multiple documents within the collection. In fact, server-side programming is the only way to get transaction semantics within Cosmos DB.

In this series of blog posts, we will explore server-side programming in Cosmos DB, and we will use TypeScript to write the server-side code. I will focus on how to build real-world applications, including adding unit tests to ensure the code behaves as expected, and incorporating the build and deployment of Cosmos DB server-side code into your CI/CD process. The series is split into six parts:

  • Part 1 (this post) gives an overview of the server side programmability model, the reasons why you might want to consider server-side code in Cosmos DB, and some key things to watch out for.
  • Part 2 deals with user-defined functions, the simplest type of server-side programming, which allow for adding simple computation to queries.
  • Part 3 talks about stored procedures. These provide a lot of powerful features for creating, modifying, deleting, and querying across documents – including in a transactional way.
  • Part 4 introduces triggers. Triggers come in two types – pre-triggers and post-triggers – and allow for behaviour like validating and modifying documents as they are inserted or updated, and creating secondary effects as a result of changes to documents in a collection.
  • Part 5 discusses unit testing your server-side scripts. Unit testing is a key part of building a production-grade application, and even though some of your code runs inside Cosmos DB, your business logic can still be tested.
  • Finally, part 6 explains how server-side scripts can be built and deployed into a Cosmos DB collection within an automated build and release pipeline, using Microsoft Visual Studio Team Services (VSTS).

In this series I presume some basic knowledge of Cosmos DB. If you’re completely new to Cosmos DB then I recommend reading Microsoft’s overview, and following along with one of the quick starts. A passing familiarity with TypeScript will also be helpful, but even if you don’t know how to use TypeScript, I’ll try to cover the key points you need to know to get started.

Using TypeScript

TypeScript is a language that compiles (or, technically, transpiles) into JavaScript. It provides a number of nice features and improvements over JavaScript, the main one being type safety. This means that the TypeScript compiler can check that your code is accessing the correct types and members as you write it. Writing code in TypeScript allows for a better level of certainty that your code is actually going to work, as well as providing some very nice development-time features such as IntelliSense. It also helps to have strong typing when unit testing, and particularly when mocking out external interfaces and classes within tests.

Because TypeScript compiles into JavaScript code, any JavaScript runtime will be able to run code that had been written in TypeScript. This includes Cosmos DB’s JavaScript engine, Chakra. Even though Cosmos DB doesn’t know about TypeScript or support it directly, we can still take advantage of many of the features that TypeScript provides, and then compile our script into JavaScript before handing it over to Cosmos DB for execution.

TypeScript also lets us separate out our code into multiple .ts files, keeping it tidy and well-organised. Cosmos DB requires that our code be in a single .js file, though – but thankfully, TypeScript can be configured to combine our code when it compiles it.

When working with external libraries and APIs within TypeScript, we need to use type definitions. These specify the details of the types we will use. While the Cosmos DB team doesn’t provide first-party type definitions for their server-side API, there are publicly accessible, open-source type definitions available from the DefinitelyTyped repository. We will use these later in this series.

Note that Cosmos DB supports the ECMAScript 2015 version of JavaScript. TypeScript can be configured to emit JavaScript in several different versions, including ECMAScript 2015 code, so this is not a problem for us. We’ll see how to do this in part 2 of this series.

Impact of Server-Side Programming on Request Units

When using Cosmos DB, we don’t provision CPU cores or disk speed or memory. Instead, Cosmos DB uses request units as its currency. A Cosmos DB collection is provisioned with a certain number of request units per second (RU/s), which can be scaled as necessary to cope with your application’s demands. The RU/s provisioned dictates the monetary cost of running the collection. For example, a simple collection with a light query load might be provisioned with 1000 RU/s, which (as of January 2018) costs approximately USD$60 per month. For more information on Cosmos DB’s request unit model see here, and for the latest pricing information, see here.

Server-side code running within Cosmos DB can easily consume a lot of request units, potentially exhausting your allowance for that second and forcing your application to have to retry operations against Cosmos DB. Furthermore, the cost of running queries server-side may sometimes be higher than the cost of running the equivalent query using the standard client-side APIs, due to the resources it takes to start up a stored procedure or function from JavaScript. Cosmos DB does have some optimisations to reduce the cost of running JavaScript code – for example, internally Cosmos DB compiles the JavaScript code to bytecode and then caches this bytecode so that it doesn’t need to recompile it on every invocation. However, running arbitrary code will usually be more expensive than just using the client-side query APIs, and this means that server-side code may not be appropriate if you don’t actually need the benefits it provides.

Additionally, the request unit usage for a given piece of server-side code is not fully predictable or consistent – in my own testing, I’ve seen the exact same piece of code, working on the same data set, take anywhere from 3.2 RUs through to 4.8 RUs to execute. This is in contrast to the rest of Cosmos DB, where request unit usage is very predictable.

Nevertheless for some scenarios, such as bulk inserts of multiple documents, or generating sample data, it may take fewer request units to run code server-side than client side. It is important to benchmark your code and compare the possible approaches to fully understand the best option for your requirements.

Consistency: Transactions and Indexes

Cosmos DB’s client-side programming model does not provide for transactional consistency across multiple documents. For example, you may have two documents to insert or update, and require that either both operations succeed or – if there is a problem with writing one of them – that both of them should fail. The Cosmos DB server-side programmability model allows for this behaviour to be implemented, because all server-side code runs within an implicit transaction. This means that we get ACID transaction semantics automatically whenever we execute a stored procedure or trigger. Cosmos DB runs stored procedures and triggers on the primary replica that is used to host the data, which allows it to give this level of transactional isolation while still allowing for high performance operations within the transaction.

Note, however, that transactions are not serialised. This means that other transactions may be happening simultaneously on other documents within the collection in parallel with your transaction. This is important because it means that functionality like real-time aggregation of data may not always be looking at a consistent view of the world, and you can get race conditions. This is simply due to the way Cosmos DB works, and is not something that we can easily program around.

A further nuance to be aware of is that, in Cosmos DB, indexes can be updated asynchronously. This means that if you query the collection within a trigger, you may not see the document currently being inserted or updated. Again, this makes it challenging to do certain types of queries (such as aggregations), but is a byproduct of Cosmos DB’s emphasis on enabling high performance and throughput. (Update: the original version of this post stated that indexes are always updated asynchronously, but this is not necessarily true. Indexing in Cosmos DB has its own complexities and is outside of the scope of this series; see here for more information on Cosmos DB’s indexing policies.)

API Models

Cosmos DB provides several API models to access data in your databases: SQL to use a SQL-based syntax; MongoDB for using the MongoDB client libraries and tools; Table to use the Azure Storage table API; and Gremlin to use the Gremlin graph protocol. All of these ultimately store data in the same way though, and all of them allow for Cosmos DB’s server-side programmability model.

In this series I will focus purely on the SQL API, but most of the same concepts can be applied to the other API models.

Restrictions

Cosmos DB has placed some restrictions on the types of operations that can be run from within the server. This is mostly to optimise the performance and security of the service.

Time restriction: each server-side operation has a fixed amount of time that it must execute within. The exact amount of time is not documented, but the server-side API provides some features to indicate when your script is approaching its limit. We will discuss this more in later parts of the series. It is important to build in this restriction when designing your stored procedures and triggers, and to avoid writing server-side code that will make high volumes of queries. Instead, if you batch these up across multiple stored procedure calls, you are more likely to have all of your code execute successfully.

Limited set of functionality: although it executes arbitrary JavaScript, Cosmos DB’s server-side programming model is designed for basic computation and for interacting with the Cosmos DB collection itself. We cannot call external web APIs, communicate with other collections, or import any complex JavaScript libraries.

Limited fluent query capability: Cosmos DB’s server-side API has a fluent JavaScript-based query syntax to perform various types of queries, filters, projections, etc on the underlying collection. However, this API does not support all of the rich functionality that the SQL grammar provides, nor does it allow for the same types of queries as the other API models, such as Gremlin’s graph query capability.

One example feature that is missing from the server-side query API is aggregation. The Cosmos DB SQL dialect allows for queries such as SUM, MIN, MAX, and COUNT. These cannot be performed using the fluent server-side query API. However, SQL queries can be executed from within server-side code, so this is not a serious limitation and really just affects the way the code is written, not the functionality that is exposed.

Single JavaScript file: a single stored procedure, trigger, or user-defined function is represented by a JavaScript function, which in turn may call other functions. However, all of the code must be placed in a single file. JavaScript modules and other similar features are not supported. We will see how to split our functionality into multiple TypeScript files, while still emitting a single JavaScript file, later in this series.

Following Along

In this series, you will be able to follow along and create each type of server-side programming entity in Cosmos DB: a user-defined function, a stored procedure, and a trigger. We will build up a set of server-side code, and then in parts 5 and 6 of this series we will look at how to get these ready for a production deployment by testing and automatically building and deploying them to Cosmos DB.

You can follow along whether you use Windows, macOS, or Linux to develop. There are just a few prerequisites:

  • A good text editor: I use Visual Studio Code, which comes with the TypeScript programming extension, but you can use anything you like.
  • Node Package Manager (NPM): you can install this here if you don’t already have it.
  • An Azure subscription. Alternatively, you can use the Cosmos DB emulator to run this locally and at no charge, but you will need to adapt the instructions slightly.

Sample Scenario

A common use for a database is to store information about orders that customers make for products. Orders typically contain some basic overall information, such as an order ID, date, customer ID, and a set of order items – references to products and the quantities ordered. In this series we will work with a simple hypothetical order database implemented in Cosmos DB.
We will use the SQL API, and we will use a non-partitioned collection. Note that partitioned collections behave the same way as non-partitioned collections when it comes to server-side programmability, but they also have a few nuances in their behaviour that we won’t go through here.

Summary

Server-side programming in Cosmos DB is extremely powerful. It gives us the ability to write functions, stored procedures, and triggers that execute within the database engine, and allow for features that are simply not possible through the client-side programming model. However, there are limitations and things to be aware of, including the potentially high cost of running some types of operations from the server. The features also do not provide the same degree of flexibility and power as their counterparts in SQL Server and other relational databases. Nevertheless, the server-side programming model in Cosmos DB is enormously useful for certain types of situations.

By using TypeScript to add type safety, and by adding unit tests and good continuous integration and continuous deployment practices, we can build advanced behaviour into our production-grade applications – all while taking advantage of the high performance and scale capabilities of Cosmos DB.

In the next part of this blog series, we will start writing some server-side code – first by building a user-defined function.

Category:
Azure Platform
Tags:
, , ,