Stored procedures, the second type of server-side code that can run within Cosmos DB, provide the ability to execute blocks of functionality from inside the database engine. Typically we use stored procedures for discrete tasks that can be encapsulated within a single invocation. In this post, we will discuss some situations where stored procedures can be used and the actions and queries that they can perform. We’ll then start to work through the server-side API model, and look at how we can work with the incoming stored procedure invocation’s request and response as well as the Cosmos DB collection itself. Then we’ll build a simple stored procedure using TypeScript.
This post is part of a series of posts about server-side programming for Cosmos DB:
- Part 1 gives an overview of the server side programmability model, the reasons why you might want to consider server-side code in Cosmos DB, and some key things to watch out for.
- Part 2 deals with user-defined functions, the simplest type of server-side programming, which allow for adding simple computation to queries.
- Part 3 (this post) talks about stored procedures. These provide a lot of powerful features for creating, modifying, deleting, and querying across documents – including in a transactional way.
- Part 4 (coming soon) introduces triggers. Triggers come in two types – pre-triggers and post-triggers – and allow for behaviour like validating and modifying documents as they are inserted or updated, and creating secondary effects as a result of changes to documents in a collection.
- Part 5 (coming soon) discusses unit testing your server-side scripts. Unit testing is a key part of building a production-grade application, and even though some of your code runs inside Cosmos DB, your business logic can still be tested.
- Finally, part 6 (coming soon) explains how server-side scripts can be built and deployed into a Cosmos DB collection within an automated build and release pipeline, using Microsoft Visual Studio Team Services (VSTS).
Stored procedures are extremely versatile, and can be used for a number of different tasks including:
- Encapsulating a complex set of queries and executing them as one logical operation – we will work with this example below.
- Retrieving data for a report using a complex set of queries, and combining the results into a single response that can be bound to a report UI.
- Generating mock data and inserting it into the collection.
- Doing a batch insert, update, upsert, or delete of multiple documents, taking advantage of the transactional processing of stored procedures.
Of course, if you are building a simple application without the need for complex queries, you may be able to achieve everything you need with just the Cosmos DB client-side SDKs. However, stored procedures do give us some power and flexibility that is not possible with purely client-based development, including transaction semantics.
Transactions and Error Handling
Cosmos DB’s client programming model does not provide for transactional consistency. However, stored procedures do run within an implicit transaction. This means that if you modify multiple documents within a stored procedure, then either all of those changes will be saved or – in the event of an error – none of them will be saved. Transactions provide four guarantees (atomicity, consistency, isolation, and durability, also known as ACID). More information on Cosmos DB transactions is available here.
The Cosmos DB engine handles committing and rolling back transactions automatically. If a stored procedure completes without any errors being thrown then the transaction will be committed. However, if even one unhandled error is thrown, the transaction will be rolled back and no changes will be made.
Working with the Context
Unlike user-defined functions, which we discussed in part 1 of this series, stored procedures allow us to access and make changes to the collection that they run within. We can also return results from stored procedures. Both of these types of actions require that we work with the context object.
Within a stored procedure, we can make a call to the
getContext() function. This returns back an object with three functions.
getContext().getRequest()is used to access the request details. This is mostly helpful for triggers, and we will discuss this in part 4 of this series.
getContext().getResponse()lets us set the response that we should send back. For stored procedures, this is the way that we will return data back to the client if the stored procedure has something to return.
getContext().getCollection()gives us access to the Cosmos DB collection that the stored procedure runs within. In turn, this will let us read and write documents.
Each of the calls above corresponds to a type –
Collection, respectively. Each of these types, in turn, provide a set of functions for interacting with the object. For example,
getContext().getCollection().queryDocuments() lets us run a query against documents in the collection, and
getContext().getResponse().setBody() lets us specify the output that we want the stored procedure to return. We’ll see more of these functions as we go through this series.
Also note that the double-underscore (
__) is automatically mapped to the
getContext().getCollection() function. In this series we won’t use this shortcut because I want to be more explicit, especially to help with testing when we get to part 5.
Human-readable documentation for the types and their members is provided by Microsoft here. Of course, one of the main reasons we’re using TypeScript in this series is so that we get type checking and strong typing against the Cosmos DB object model, so a human-readable description isn’t really sufficient. In TypeScript, this is done through type definitions – descriptions of types and their members that TypeScript can use to power its type system.
While Microsoft doesn’t provide first-party TypeScript type definitions for Cosmos DB, an open-source project named DefinitelyTyped provides and publishes these definitions here. (Note that the type definitions use the old name for Cosmos DB – DocumentDB – but they are still valid and being updated.)
which will map to the following SQL query:
However, there are some limitations to this query syntax. We can’t perform aggregations, and we can’t do queries using user-defined functions. These limitations may be lifted in future, but for now, in this series we will use the SQL syntax instead so that we can get the full power of Cosmos DB’s query engine. We can use this type of syntax to make a SQL-based query:
We can also request a document by ID if we want to as well:
Another important consideration when executing queries is that Cosmos DB limits the amount of time that a stored procedure can run for. We can test whether our stored procedure is approaching that time limit by inspecting the return value we receive when we submit the query. If we receive a
false response, it means the query wasn’t accepted – and that it’s probably time to wrap up our logic before we get forcibly shut off. In some stored procedures, receiving a
false like this may mean that you simply throw an error and consider the whole stored procedure execution to have failed.
In the last section, we discussed using the
collection.queryDocuments function to execute a SQL query against the collection. While this technically works, once we start including parameters in our queries then we’d need to concatenate them into the query string. This is a very, very bad idea – it opens us up to a class of security vulnerabilities called SQL injection attacks.
When we’re writing SQL queries with parameters, we should instead the overload of the
collection.queryDocuments function that accepts an
IParameterizedQuery instead of a
string. By doing this, we pass our parameters explicitly and ensure that they are handled and escaped appropriately by the database engine. Here’s an example of how we can do this from our TypeScript code:
We can also make changes to documents within the collection,. There are several functions on the
Collection type to help with this, including:
createDocumentinserts a new document into the collection.
replaceDocumentupdates an existing document in the collection. You must provide the document link to use this function.
deleteDocumentdeletes a document from the collection.
upsertDocumentThere are also functions to deal with attachments to documents, but we won’t work with those in this series.
These functions that work with existing documents also take an optional parameter to specify the
etag of the document. This allows for us to take advantage of optimistic concurrency. Optimistic concurrency is very useful, but is outside the scope of this series.
Structuring Your Stored Procedure Code
Stored procedures will often become more complex than UDFs, and may incorporate business logic as well as interaction with the Cosmos DB collection inside the code. When we’re writing a production-grade application it is important that we structure our code correctly so that each part is testable, and has a clearly defined responsibility and interactions with other components. In Cosmos DB, we are interacting with the collection in order to execute queries and update documents, and these side effects are important to test as well. We’ll discuss testing in more detail in part 5, but even before we worry about testing, it’s a good idea to structure our code properly.
When I write Cosmos DB stored procedures, I find it’s helpful to have a simple ‘entry point’ function. This entry point does the interaction with the
getContext() function and retrieves the
Response objects as required. These, along with any other parameters, are then passed into an internal implementation function, which in turn may invoke other functions to do other parts of the logic. By structuring the functions in this way we can ensure that each function has a clear purpose, and that the external components can be mocked and/or spied upon during our tests.
Writing our stored procedure in TypeScript also gives us the ability to store our functions in different
outFile property in our
include directive within the
tsconfig.json file, and then having a wildcard
* to catch the remaining files, like this:
Calling Stored Procedures
Once a stored procedure is written, we need to call it in order to check that it’s working, and then to use it in our real applications. There are several ways we can call our stored procedure and pass in the arguments it expects.
- The Azure Portal provides a test interface for invoking stored procedures. This is how we will test the stored procedure we write below.
- The client SDKs provide platform-specific features for invoking stored procedures. For example, the .NET client library for Cosmos DB provides the
DocumentClient.ExecuteStoredProcedureAsyncfunction, which accepts the ID of a stored procedure and any arguments that it might be expecting.
- The Cosmos DB REST API also allows for invoking stored procedures directly.
Again, the exact way you call stored procedures may depend on the Cosmos DB API you are targeting – in this series we are using the SQL API, and the invocation mechanisms for MongoDB, Gremlin, and the other APIs may be different.
Now that we have talked about the different aspects of writing stored procedures, we can think about how we might use a stored procedure in our sample scenario.
Defining our Stored Procedure
In our hypothetical ordering application, we have a Cosmos DB collection of documents representing orders. Each order document contains a customer’s ID (as we saw in part 2 of this series), and also contains a set of items that the customer ordered. An order item consists of the product ID the customer ordered and the quantity of that product.
Because Cosmos DB is a schemaless database, our order documents may coexist with many other documents of different types. Therefore, we also include a
type parameter on the document to indicate that it is an order. This type discriminator pattern is quite common in schemaless databases.
An example of an order document in our collection is:
For this stored procedure we want to pass in a set of product IDs, and get back a grouped list of IDs for customers who have ordered any of those products. This is similar to doing a
GROUP BY in a relational database – but currently Cosmos DB doesn’t provide this sort of grouping feature, and so we are using a stored procedure to fill in the gap. Doing this from the client SDK would require multiple queries against the collection, but by using a stored procedure we can just make one call.
At a high level, our stored procedure logic looks like this:
- Accept a list of product IDs as an argument.
- Iterate through the product IDs.
- For each product ID, run a query to retrieve the customer IDs for the customers that have ordered that product, filtering to only query on order documents. The query we’ll run looks like this:
- Once we have all of the customer IDs for each product in our list, create a JSON object to represent the results like this:
Preparing a Folder
Now we can start writing our stored procedure. If you want to compare against my completed stored procedure, you can access it on GitHub.
In part 2 of this series we covered how to set up the Cosmos DB account and collection, so I won’t go through that again. We also will reuse the same folder structure as we did in part 2, so you can refer to that post if you’re following along.
There’s one major difference this time though. In our
package.json file, we need to add a second entry into the
devDependencies list to tell NPM that we want to include the TypeScript type definitions for Cosmos DB. Our
package.json file will look like this:
Open a command prompt or terminal window, and run
npm install within this folder. This will initialise TypeScript and the type definitions.
We’ll also adjust the
tsconfig.json file to emit a file with the name
Writing the Stored Procedure
Now let’s create a file named
src/getGroupedOrders.ts. This is going to contain our stored procedure code. Let’s start with adding a basic function that will act as our entry point:
As discussed above, this is a pattern I try to follow when I write Cosmos DB server-side code. It helps to keep the interaction with the
getContext() function isolated to this one place, and then all subsequent functions will work with explicit objects that we’ll pass around. This will help when we come to test this code later. You can see that this function calls the
getGroupedOrdersImpl function, which does the actual work we need done – we’ll write this shortly.
Before then, though, let’s write a basic interface that will represent our response objects:
getGroupedOrdersImpl function will accept an array of product IDs and the collection in which to query them, and it will return an array of these
CustomersGroupedByProducts. Of course, since
Now we can add a basic shell of an implementation for our
getGroupedOrdersImpl function. As you type this, notice that (if you’re in an editor that supports it, like Visual Studio Code) you get IntelliSense and statement completion thanks to the TypeScript definitions:
This function prepares a variable called
outputArray, which will contain all of our product/customer groupings. Then we have some placeholder code to perform our actual queries, which we’ll fill in shortly. Finally, this function returns the output array.
Now we can fill in the placeholder code. Where we have
REPLACEME in the function, replace it with this:
There’s a lot going on here, so let’s break it down:
- The first part (lines 1-6) sets up a new
IParameterizedQuery, which lets us execute a SQL query using parameters. As discussed above, this is a much more secure way to handle parameters than string concatenation. The query will find all orders containing the product ID we’re looking for, and will return back the customer ID.
- Next, the query callback function is prepared (lines 7-18). This is what will be called when the query results are available. In this function we pull out the results and push them onto our
outputArray, ready to return to the calling function.
- Then we try to execute the query against the collection by using the
collection.queryDocuments()function (line 19). This function returns a boolean to indicate whether the query was accepted (line 20). If it wasn’t, we consider this to be an error and immediately throw an error ourselves (line 22).
That’s it! The full stored procedure file looks like this:
Here’s what the your folder structure should now look like:
Compiling the Stored Procedure
npm run build. You should see that a new
output folder has been created, and inside that is a file named
Deploying the Stored Procedure
Now let’s deploy the stored procedure to our Cosmos DB collection. Open the Azure Portal, browse to the Cosmos DB account, and then go to
Script Explorer. Click
Create Stored Procedure.
getGroupedOrders as the ID, and then paste the contents of the compiled
Save to install the stored procedure to Cosmos DB. (Once again, this isn’t the best way to install a stored procedure – we’ll look at better ways in part 6 of this series.)
Testing the Stored Procedure
Now let’s insert some sample data so that we can try the stored procedure out. Let’s insert these sample documents using
Document Explorer, as described in part 2.
Here are the three sample documents we’ll use:
Now go back into
Script Explorer, open the stored procedure, and notice that there is a test panel below the script body textbox. We can enter our stored procedures parameters into the
Inputs field. Let’s do that now. Enter
[["P1", "P2", "P10"]] – be careful to include the double square brackets around the array. Then click the
Save & Execute button, and you should see the results.
If we reformat them, our results look like the following. We can see that we have an array containing an object for each product ID we passed into the query, and each object has a list of customer IDs who ordered that product:
So our stored procedure works! We’ve now successfully encapsulated the logic involved in querying for customers that have ordered any of a set of products.
Stored procedures give us a way to encapsulate queries and operations to be performed on a Cosmos DB collection, and to invoke them as a single unit. Stored procedures run within an implicit transaction, so any unhandled errors will result in the changes being rolled back. Unlike in UDFs, we are also able to access the collection within a stored procedure by using the
getContext() function, and by retrieving the
Collection objects. This allows us to return rich data, including objects and arrays, as well as to interact with the collection and its documents. In the next part of this series we will discuss triggers, the third type of server-side programming available in Cosmos DB, which allow us to intercept changes happening to documents within the collection.
- Stored procedures encapsulate logic, queries, and actions upon documents within the collection.
- Stored procedures provide transactional isolation, and all stored procedures run within a transaction scope.
getContext()function allows us to access the
- TypeScript definitions are available to help when writing code against these objects.
- Cosmos DB provides an integrated query syntax, which is great for simple queries, but doesn’t cover all possible queries that can be executed against the collection.
- Arbitrary SQL queries can also be executed. If these contain parameters then the
IParameterizedQueryinterface should be used to ensure safe coding practices are adhered to.
- The order of functions inside the stored procedure’s file matters. The first function will be the one that Cosmos DB treats as the entry point.
- You can view the code for this post on GitHub.