User-defined functions (UDFs) in Cosmos DB allow for simple calculations and computations to be performed on values, entities, and documents. In this post I will introduce UDFs, and then provide detailed steps to set up a basic UDF written in TypeScript. Many of these same steps will be applicable to stored procedures and triggers, which we’ll look at in future posts.

This is the second part of a series of blog posts on server-side development using Cosmos DB with TypeScript.

  • Part 1 gives an overview of the server side programmability model, the reasons why you might want to consider server-side code in Cosmos DB, and some key things to watch out for.
  • Part 2 (this post) deals with user-defined functions, the simplest type of server-side programming, which allow for adding simple computation to queries.
  • Part 3 talks about stored procedures. These provide a lot of powerful features for creating, modifying, deleting, and querying across documents – including in a transactional way.
  • Part 4 introduces triggers. Triggers come in two types – pre-triggers and post-triggers – and allow for behaviour like validating and modifying documents as they are inserted or updated, and creating secondary effects as a result of changes to documents in a collection.
  • Part 5 discusses unit testing your server-side scripts. Unit testing is a key part of building a production-grade application, and even though some of your code runs inside Cosmos DB, your business logic can still be tested.
  • Finally, part 6 explains how server-side scripts can be built and deployed into a Cosmos DB collection within an automated build and release pipeline, using Microsoft Visual Studio Team Services (VSTS).

User-Defined Functions

UDFs are the simplest type of server-side development available for Cosmos DB. UDFs generally accept one or more parameters and return a value. They cannot access Cosmos DB’s internal resources, and cannot read or write documents from the collection, so they are really only intended for simple types of computation. They can be used within queries, including in the SELECT and WHERE clauses.

UDFs are simple enough that types are almost not necessary, but for consistency we will use TypeScript for these too. This will also allow us to work through the setup of a TypeScript project, which we’ll reuse for the next parts of this series.

One note on terminology: the word function can get somewhat overloaded here, since it can refer to the Cosmos DB concept of a UDF, or to the TypeScript and JavaScript concept of a function. This can get confusing, especially since a UDF can contain multiple JavaScript functions within its definition. For consistency I will use UDF when I’m referring to the Cosmos DB concept, and function when referring to the JavaScript or TypeScript concept.

Parameters

UDFs can accept zero or more parameters. Realistically, though, most UDFs will accept at least one parameter, since UDFs almost always operate on a piece of data of some kind. The UDF parameters can be of any type, and since we are running within Cosmos DB, they will likely be either a primitive type (e.g. a single string, number, array, etc), a complex type (e.g. a custom JavaScript object, itself comprised of primitive types and other complex types), or even an entire document (which is really just a complex type). This gives us a lot of flexibility. We can have UDFs that do all sorts of things, including:

  • Accept a single string as a parameter. Do some string parsing on it, then return the parsed value.
  • Accept a single string as well as another parameter. Based on the value of the second parameter, change the parsing behaviour, then return the parsed value.
  • Accept an array of values as a parameter. Combine the values using some custom logic that you define, then return the combined value.
  • Accept no parameters. Return a piece of custom data based on the current date and time.
  • Accept a complex type as a parameter. Do some parsing of the document and then return a single output.

Invoking a UDF

A UDF can be invoked from within the SELECT and WHERE clauses of a SQL query. To invoke a UDF, you need to include the prefix udf. before the function name, like this:
[sourcecode language=”sql” wraplines=”true” collapse=”false” gutter=”false”]
SELECT udf.parseString(c.stringField) FROM c
[/sourcecode]
In this example, udf. is a prefix indicating that we want to call a UDF, and parseString is the name of the UDF we want to call. Note that this is identifier that Cosmos DB uses for the UDF, and is not necessarily the name of the JavaScript function that implements the UDF. (However, I strongly recommend that you keep these consistent, and will do so throughout this series.)
You can pass in multiple parameters to the UDF by comma delimiting them, like this:
[sourcecode language=”sql” wraplines=”true” collapse=”false” gutter=”false”]
SELECT udf.parseString(c.stringField, 1234) FROM c
SELECT udf.parseString(c.stringField1, c.stringField2) FROM C
[/sourcecode]
To pass a hard-coded array into a UDF, you can simply use square brackets, like this:
[sourcecode language=”sql” wraplines=”true” collapse=”false” gutter=”false”]
SELECT udf.parseArray([“arrayValue1”, “arrayValue2”, “arrayValue3”])
[/sourcecode]
Now that we’ve talked through some of the key things to know about UDFs let’s try writing one, using our sample scenario from part 1.

Defining our UDF

Let’s imagine that our order system was built several years ago, and our business has now evolved significantly. As a result, we are in the middle of changing our order schema to represent customer IDs in different ways. Cosmos DB makes this easy by not enforcing a schema, so we can simply switch to the new schema when we’re ready.

Our old way of representing a customer ID was like this:

Now, though, we are representing customers with additional metadata, like this:

However, we still want to be able to easily use a customer’s ID within our queries. We need a way to dynamically figure out the customer’s ID for an order, and this needs to work across our old and new schemas. This is a great candidate for a UDF. Let’s deploy a Cosmos DB account and set up this UDF.

Setting Up a Cosmos DB Account and Collection

First, we’ll deploy a Cosmos DB account and set up a database and collection using the Azure Portal. (Later in this series, we will discuss how this can be made more automatable.) Log into the Azure Portal and click New, then choose Cosmos DB. Click Create.

cosmos-udf-1

We need to specify a globally unique name for our Cosmos DB account – I have used johnorders, but you can use whatever you want. Make sure to select the SQL option in the API drop-down list. You can specify any subscription, resource group, and location that you want. Click Create, and Cosmos DB will provision the account – this takes around 5-10 minutes.

Once the account is created, open it in the Portal. Click Add Collection to add a new collection.

cosmos-udf-2

Let’s name the collection Orders, and it can go into a database also named Orders. Provision it with a fixed (10GB) capacity, and 400 RU/s throughput (the minimum).

cosmos-udf-3

Note that this collection will cost money to run, so you’ll want to remember to delete the collection when you’re finished. You can leave an empty Cosmo DB account for no charge, though.

Preparing a Folder

TypeScript requires that we provide it with some instructions on how to compile our code. We’ll also use Node Package Manager (NPM) to tie all of our steps together, and so need to prepare a few things before we can write our UDF’s code.

Note that in parts 2, 3, and 4 of this series, we will write each server-side component as if it was its own independent application. This is to keep each of these posts easy to follow in a standalone way. However, in parts 5 and 6, we will combine these into a single folder structure. This will more accurately represent a real-world application, in which you are likely to have more than one server-side component.

Create a new folder on your local machine and add a file named package.json into it. This contains the NPM configuration we need. The contents of the file should be as follows:

The project.json file does the following:

  • It defines the package dependencies we have when we develop our code. Currently we only have one – typescript.
  • It also defines a script that we can execute from within NPM. Currently we only have one – build, which will build our UDF into JavaScript using TypeScript’s tsc command.

At a command prompt, open the folder we’ve been working in. Run npm install, which will find and install the TypeScript compiler. If you don’t have NPM installed, install it by following the instructions here.

Next, create a file named tsconfig.json. This is the TypeScript project configuration. This should contain the following:

The tsconfig.json instructs TypeScript to do the following:

  • Find and compile all of the files with the .ts extension inside the src folder (we haven’t created this yet!).
  • Target ECMAScript 2015, which is the version of JavaScript that Cosmos DB supports. If we use more modern features of TypeScript, it will handle the details of how to emit these as ECMAScript 2015-compatible code.
  • Save the output JavaScript to a single file named output/udf-getCustomerId.js. This filename is arbitrary and it could be any name we want, but I find it helpful to use the convention of  {kind}-{objectName}.js, especially as we add more code in later parts of this series.
    Note that the outFile directive means that, even if we included multiple source TypeScript files, TypeScript will save the complete compiled script into a single output file. This is in keeping with the requirement that Cosmos DB imposes that a server-side component has to be specified in a single file.

Writing the UDF

Now we can write our actual UDF code! If you want to compare against my completed UDF, you can access it on GitHub.
Create a folder named src, and within that, create a file named getCustomerId.ts. (Once again, this file doesn’t need to be named this way, but I find it helpful to use the UDF’s name for the filename.) Here’s the contents of this file:

Briefly, here’s an explanation of what this file does:
  • It declares a function named getCustomerId, which accepts a single parameter of type OrderDocument and returns a string. This is the function that represents our UDF.
  • The function inspects the document provided, and depending on which version of the schema it follows, it pulls the customer ID out of the appropriate field.
  • If the customer ID isn’t in either of the places it expects to find them, it throws an error. This will be further thrown up to the client by Cosmos DB.
  • Finally, it declares an interface named OrderDocument. This represents the shape of the data we’re expecting to store in our collection, and it has both of the ways of representing customer IDs.
    Note that we are using an interface and not a class, because this data type has no meaning to Cosmos DB – it’s only for use at development and build time.
    Also note that we could put this into its own orderDocument.ts file if wanted to keep things separated out.

At the end of this, your folder should look something like this:

  • /
    • package.json
    • tsconfig.json
    •  src/
      • getCustomerId.ts

You can access a copy of this as a GitHub repository here.

We have now written our first UDF! We’re almost ready to run it – but before then, we need to compile it.

Compiling the UDF

At the command line run npm run build. This will run the build script we defined inside the package.json file, which in turn simply runs the tsc (TypeScript compiler) command-line application. tsc will find the tsconfig.json file and knows what to do with it.

Once it’s finished, you should see a new output folder containing a file named udf-getCustomerId.js. This is our fully compiled UDF! It should look like the following:

If you compare this to the code we wrote in TypeScript, you’ll see that it is almost the same – except all of the type information (variable types and the interface) have been stripped away. This means that we get the type safety benefits of TypeScript at authoring and compilation time, but the file we provide to Cosmos DB is just a regular JavaScript file.

Deploying the UDF

Now we can deploy the UDF. Back in the Azure Portal, open Script Explorer under the Collections section, and then click Create User Defined Function.

cosmos-udf-4.png

Enter getCustomerId in the ID field. This will be the name we address the UDF by when we start to call it from our queries. Note that you don’t have to use the same ID field here as the JavaScript function name – in fact, the JavaScript function can be named anything you like. For clarify, though, I find it helpful to keep everything in sync.

Now we can copy and paste the contents of the udf-getCustomerId.js file into the large script text box.

cosmos-udf-5

Click Save to install the UDF to Cosmos DB.

Testing the UDF

Finally, let’s test the UDF! We’ll need to add a couple of pieces of sample data. Click Document Explorer under the Collections section, and then click the Create button. Paste in this sample document:

Click Save, and then close the blade and create a second document with the following contents:

This gives us enough to test with. Now click Query Explorer under the Collections section. Enter the following query:

[sourcecode language=”sql” wraplines=”true” collapse=”false” gutter=”false”]
SELECT c.id, udf.getCustomerId(c) AS customerId FROM c
[/sourcecode]

This query does the following:

  • Refers to each document within the current collection (c).
  • Runs the getCustomerId UDF, passing in the document contents. Note that to refer to a UDF, you must prefix the name of the UDF with udf..
  • Projects out the document ID (id) and the customerId as customerId.

You should see the following output:

That’s exactly what we wanted to see – the UDF has pulled out the correct field for each document.

As a point of interest, notice the Request Charge on the query results. Try running the query a few times, and you should see that it fluctuates a little – but is generally around 3.5 RUs.

Now let’s try passing in an invalid input into our UDF. Run this query:

[sourcecode language=”sql” wraplines=”true” collapse=”false” gutter=”false”]
SELECT udf.getCustomerId(‘123’) FROM c
[/sourcecode]

Cosmos DB will give you back the error that our UDF threw because the input data (123) didn’t match either of the schemas it expected:

Encountered exception while executing Javascript.
  Exception = Error: Document with id undefined does not contain customer ID in recognised format.
  Stack trace: Error: Document with id undefined does not contain customer ID in recognised format.
    at getCustomerId (getCustomerId.js:11:5)
    at __docDbMain (getCustomerId.js:15:5)
    at Global code (getCustomerId.js:1:2)

So we’ve now tested out the old customer ID format, the new customer ID format, and some invalid input, and the UDF behaves as we expect.

Summary

UDFs provide us with a way to encapsulate simple computational logic, and to expose this within queries. Although we can’t refer to other documents or external data sources, UDFs are a good way to expose certain types of custom business logic. In this post, we’ve created a simple UDF for Cosmos DB, and tested it using a couple of simple documents. In the next part of this series we’ll move on to stored procedures, which allow for considerably more complexity in our server-side code.

Key Takeaways

  • UDFs are intended for simple computation.
  • They can be used within queries, including in the SELECT and WHERE clauses.
  • UDFs cannot access anything within the Cosmos DB collection, nor can they access any external resources.
  • They can accept one or more parameters, which must be provided when calling the UDF.
  • The name of the JavaScript function does not have to match the name of the UDF, but to keep your sanity, I highly recommend keeping them consistent.
  • UDFs can be invoked from within a query by using the udf. prefix, e.g. SELECT udf.getCustomerId(c) FROM c.
  • You can view the code for this post on GitHub.

 

Category:
Azure Platform
Tags:
, , , ,