Cosmos DB is a NoSQL database provided as part of Microsoft’s Azure platform. Designed for very high performance and scalability, Cosmos DB is rapidly becoming one of the default data storage options I recommend for new green-field applications and microservices. It is a fairly opinionated database, with some guidelines that you need to follow to take full advantage of its scalability and performance, but it also provides a number of features to enable sophisticated and powerful applications to be built on top of its engine.
In this series of blog posts, we will explore server-side programming in Cosmos DB, and we will use TypeScript to write the server-side code. I will focus on how to build real-world applications, including adding unit tests to ensure the code behaves as expected, and incorporating the build and deployment of Cosmos DB server-side code into your CI/CD process. The series is split into six parts:
- Part 1 (this post) gives an overview of the server side programmability model, the reasons why you might want to consider server-side code in Cosmos DB, and some key things to watch out for.
- Part 2 deals with user-defined functions, the simplest type of server-side programming, which allow for adding simple computation to queries.
- Part 3 talks about stored procedures. These provide a lot of powerful features for creating, modifying, deleting, and querying across documents – including in a transactional way.
- Part 4 introduces triggers. Triggers come in two types – pre-triggers and post-triggers – and allow for behaviour like validating and modifying documents as they are inserted or updated, and creating secondary effects as a result of changes to documents in a collection.
- Part 5 discusses unit testing your server-side scripts. Unit testing is a key part of building a production-grade application, and even though some of your code runs inside Cosmos DB, your business logic can still be tested.
- Finally, part 6 explains how server-side scripts can be built and deployed into a Cosmos DB collection within an automated build and release pipeline, using Microsoft Visual Studio Team Services (VSTS).
In this series I presume some basic knowledge of Cosmos DB. If you’re completely new to Cosmos DB then I recommend reading Microsoft’s overview, and following along with one of the quick starts. A passing familiarity with TypeScript will also be helpful, but even if you don’t know how to use TypeScript, I’ll try to cover the key points you need to know to get started.
TypeScript also lets us separate out our code into multiple
.ts files, keeping it tidy and well-organised. Cosmos DB requires that our code be in a single
.js file, though – but thankfully, TypeScript can be configured to combine our code when it compiles it.
When working with external libraries and APIs within TypeScript, we need to use type definitions. These specify the details of the types we will use. While the Cosmos DB team doesn’t provide first-party type definitions for their server-side API, there are publicly accessible, open-source type definitions available from the DefinitelyTyped repository. We will use these later in this series.
Impact of Server-Side Programming on Request Units
When using Cosmos DB, we don’t provision CPU cores or disk speed or memory. Instead, Cosmos DB uses request units as its currency. A Cosmos DB collection is provisioned with a certain number of request units per second (RU/s), which can be scaled as necessary to cope with your application’s demands. The RU/s provisioned dictates the monetary cost of running the collection. For example, a simple collection with a light query load might be provisioned with 1000 RU/s, which (as of January 2018) costs approximately USD$60 per month. For more information on Cosmos DB’s request unit model see here, and for the latest pricing information, see here.
Additionally, the request unit usage for a given piece of server-side code is not fully predictable or consistent – in my own testing, I’ve seen the exact same piece of code, working on the same data set, take anywhere from 3.2 RUs through to 4.8 RUs to execute. This is in contrast to the rest of Cosmos DB, where request unit usage is very predictable.
Nevertheless for some scenarios, such as bulk inserts of multiple documents, or generating sample data, it may take fewer request units to run code server-side than client side. It is important to benchmark your code and compare the possible approaches to fully understand the best option for your requirements.
Consistency: Transactions and Indexes
Cosmos DB’s client-side programming model does not provide for transactional consistency across multiple documents. For example, you may have two documents to insert or update, and require that either both operations succeed or – if there is a problem with writing one of them – that both of them should fail. The Cosmos DB server-side programmability model allows for this behaviour to be implemented, because all server-side code runs within an implicit transaction. This means that we get ACID transaction semantics automatically whenever we execute a stored procedure or trigger. Cosmos DB runs stored procedures and triggers on the primary replica that is used to host the data, which allows it to give this level of transactional isolation while still allowing for high performance operations within the transaction.
Note, however, that transactions are not serialised. This means that other transactions may be happening simultaneously on other documents within the collection in parallel with your transaction. This is important because it means that functionality like real-time aggregation of data may not always be looking at a consistent view of the world, and you can get race conditions. This is simply due to the way Cosmos DB works, and is not something that we can easily program around.
A further nuance to be aware of is that, in Cosmos DB, indexes can be updated asynchronously. This means that if you query the collection within a trigger, you may not see the document currently being inserted or updated. Again, this makes it challenging to do certain types of queries (such as aggregations), but is a byproduct of Cosmos DB’s emphasis on enabling high performance and throughput. (Update: the original version of this post stated that indexes are always updated asynchronously, but this is not necessarily true. Indexing in Cosmos DB has its own complexities and is outside of the scope of this series; see here for more information on Cosmos DB’s indexing policies.)
Cosmos DB provides several API models to access data in your databases: SQL to use a SQL-based syntax; MongoDB for using the MongoDB client libraries and tools; Table to use the Azure Storage table API; and Gremlin to use the Gremlin graph protocol. All of these ultimately store data in the same way though, and all of them allow for Cosmos DB’s server-side programmability model.
In this series I will focus purely on the SQL API, but most of the same concepts can be applied to the other API models.
Cosmos DB has placed some restrictions on the types of operations that can be run from within the server. This is mostly to optimise the performance and security of the service.
Time restriction: each server-side operation has a fixed amount of time that it must execute within. The exact amount of time is not documented, but the server-side API provides some features to indicate when your script is approaching its limit. We will discuss this more in later parts of the series. It is important to build in this restriction when designing your stored procedures and triggers, and to avoid writing server-side code that will make high volumes of queries. Instead, if you batch these up across multiple stored procedure calls, you are more likely to have all of your code execute successfully.
One example feature that is missing from the server-side query API is aggregation. The Cosmos DB SQL dialect allows for queries such as
COUNT. These cannot be performed using the fluent server-side query API. However, SQL queries can be executed from within server-side code, so this is not a serious limitation and really just affects the way the code is written, not the functionality that is exposed.
In this series, you will be able to follow along and create each type of server-side programming entity in Cosmos DB: a user-defined function, a stored procedure, and a trigger. We will build up a set of server-side code, and then in parts 5 and 6 of this series we will look at how to get these ready for a production deployment by testing and automatically building and deploying them to Cosmos DB.
You can follow along whether you use Windows, macOS, or Linux to develop. There are just a few prerequisites:
- A good text editor: I use Visual Studio Code, which comes with the TypeScript programming extension, but you can use anything you like.
- Node Package Manager (NPM): you can install this here if you don’t already have it.
- An Azure subscription. Alternatively, you can use the Cosmos DB emulator to run this locally and at no charge, but you will need to adapt the instructions slightly.
A common use for a database is to store information about orders that customers make for products. Orders typically contain some basic overall information, such as an order ID, date, customer ID, and a set of order items – references to products and the quantities ordered. In this series we will work with a simple hypothetical order database implemented in Cosmos DB.
We will use the SQL API, and we will use a non-partitioned collection. Note that partitioned collections behave the same way as non-partitioned collections when it comes to server-side programmability, but they also have a few nuances in their behaviour that we won’t go through here.
Server-side programming in Cosmos DB is extremely powerful. It gives us the ability to write functions, stored procedures, and triggers that execute within the database engine, and allow for features that are simply not possible through the client-side programming model. However, there are limitations and things to be aware of, including the potentially high cost of running some types of operations from the server. The features also do not provide the same degree of flexibility and power as their counterparts in SQL Server and other relational databases. Nevertheless, the server-side programming model in Cosmos DB is enormously useful for certain types of situations.
By using TypeScript to add type safety, and by adding unit tests and good continuous integration and continuous deployment practices, we can build advanced behaviour into our production-grade applications – all while taking advantage of the high performance and scale capabilities of Cosmos DB.
In the next part of this blog series, we will start writing some server-side code – first by building a user-defined function.