Category: Uncategorized

Multi-Model Azure Cosmos DB – Running SQL (Geospatial) Queries On a Graph

One of the major selling points for Microsoft’s Azure Cosmos DB is that its a multi-model database – you can use it as a:

  • Simple key/value pair store through the Table API
  • Document database through the SQL API
  • Graph database through the Graph (Gremlin) API
  • MongoDB database
  • Cassandra database

Underneath these APIs Cosmos uses it’s Atom-Record-Sequence (ARS) type system to manage and store the data which has the less well publicized benefit of allowing you to use different APIs to access the same data.

While I’ve not explored all the possibilities I’d like to demonstrate the potential that this approach opens up by running geo-spatial queries (longitude and latitude based) against a Graph database that is primarily used to model social network relationships.

Lets assume you’re building a social network to connect individuals in local communities and you’re using a graph database to connect people together. In this graph you have vertexes of type ‘person’ and edges of type ‘follows’ and when someone new joins your network you want to suggest people to them who live within a certain radius of their location (a longitude and latitude derived from a GPS or geocoded address). In this example we’re going to do this by constructing the graph based around social connections using Gremlin .NET and using the geospatial queries available to the SQL API (we’ll use SP_DISTANCE) to interrogate it based on location.

Given a person ID, longitude and latitude firstly lets look at how we add someone into the graph (this assumes you’ve added the Gremlin .NET package to your solution):

public Task CreateNode(string personId)
{
    const string cosmosEndpoint = "yourcosmosdb.gremlin.cosmosdb.azure.com";
    string username = $"/dbs/yoursocialnetwork/colls/relationshipcollection"; // the form of /dbs/DATABASE/colls/COLLECTION
    const string cosmosAuthKey = "yourauthkey";

    GremlinServer gremlinServer = new GremlinServer(cosmosEndpoint, 443, true, username, cosmosAuthKey);
    using (GremlinClient client = new GremlinClient(gremlinServer, new GraphSON2Reader(),
        new GraphSON2Writer(), GremlinClient.GraphSON2MimeType))
    {
        Dictionary<string, object> arguments = new Dictionary<string, object>
        {
            {"personId", personId}
        };

        await client.SubmitAsync("g.addV('person').property('id', personId)", arguments);
    }
}

If you run this code against a Cosmos DB Graph collection it will create a vertex with the label of person and assign it the specified id.

We’ve not yet got any location data attached to this vertex and that’s because Gremlin.NET and Cosmos do not (to the best of my knowledge) allow us to set a property with the specific structure required by the geospatial queries supported by SQL API. That being the case we’re going to use the SQL API to load this vertex as a document and attach a longitude and latitude. The key to this is to use the document endpoint. If you look in the code above you can see we have a Gremlin endpoint of:

yourcosmosdb.gremlin.cosmosdb.azure.com

If you attempt to run SQL API queries against that endpoint you’ll encounter errors, the endpoint we need to use is:

yourcosmosdb.documents.azure.com

To keep things simple we’ll use the LINQ support provided by the DocumentDB client but rather than define a class that represents the document we’ll use a dictionary of strings and objects. In the multi-model world we have to be very careful not to lose properties required by the other model and by taking this approach we don’t need to inspect and carefully account for each property (and ongoing change!). If we don’t do this we are liable to lose data at some point.

All that being the case we can attach our location property using code like the below:

public static async Task AddLocationToDocument(this DocumentClient documentClient, Uri documentCollectionUri, string id, double longitude, double latitude)
{
    DocumentClient documentClient = new DocumentClient(new Uri("https://yourcosmosdb.documents.azure.com"));
    Uri collectionUri = UriFactory.CreateDocumentCollectionUri("db","coll");
    IQueryable<Dictionary<string,object>> documentQuery = documentClient.CreateDocumentQuery<Dictionary<string,object>>(
            collectionUri ,
            new FeedOptions { MaxItemCount = 1 })
        .Where(x => (string)x["id"] == id);
    PersonDocument document = documentQuery.ToArray().SingleOrDefault();
    document["location"] = new Point(longitude, latitude);
    Uri documentUri = UriFactory.CreateDocumentUri(
        Constants.Storage.Cosmos.Database,
        Constants.Storage.Cosmos.RelationshipsCollection,
        id);
    await documentClient.ReplaceDocumentAsync(documentUri, document);
}

If you were now to inspect this vertex in the Azure portal you won’t see the Location property in the graph explorer – its property format is not one known to Gremlin:

However if you open the collection in Azure Storage Explorer (another example of multi-model usage) you will see the vertexes (and edges) exposed as documents and will see the location attached in the JSON:

{
    "location": {
        "type": "Point",
        "coordinates": [
            -6.45436,
            49.12892
        ]
    },
    "id": "1234",
    "label": "person",
    "_rid": "...",
    "_self": "...",
    "_etag": "\"...\"",
    "_attachments": "attachments/",
    "_ts": ...
}

Ok. So at this point we’ve got a person in our social network graph. Lets imagine we’re about to add another person and we want to search for nearby people. We’ll do this using a SQL API query to find people within 30km of a given location (I’ve stripped out some of the boilerplate from this):

const double rangeInMeters = 30000;
IQueryable<Dictionary<string,object>> query = _documentClient.Value.CreateDocumentQuery<Dictionary<string, object>>(
    collectionUri,
    new FeedOptions { MaxItemCount = 1000 })
    .Where(x => (string)x["label"] == "person" &&
                ((Point)x["location"]).Distance(new Point(location.Longitude, location.Latitude)) < rangeInMeters);
var inRange = query.ToArray();

From here we can iterate our inRange array and build a Gremlin query to attach our edges roughly of the form:

g.V('1234').addE('follows').to(g.V('5678'))

Closing Thoughts

Being able to use multiple models to query the same data brings a lot of expressive power to Cosmos DB though it’s still important to choose the model that best represents the primary workload for your data and understand how the different queries behave in respect to RUs and data volumes.

It’s also a little clunky working in the two models and its quite easy, particularly if performing updates, to break things. I expect as Cosmos becomes more mature that this is going to be cleaned up and I think Microsoft will talk about these capabilities more.

Finally – its worth opening up a graph database with Azure Storage Explorer and inspecting the documents, you can see how Cosmos manages vertexes and edges and it gives a more informed view as to how graph queries consume RUs.

 

 

 

Cosmos DB x-ms-partitionkey Error

A small tip but one might save you some time – if you’re trying to run Cosmos DB queries via the REST API you might encounter the following Bad Request error:

The partition key supplied in x-ms-partitionkey header has fewer components than defined in the the collection.

What you actually need to specify is the partition key for your collection using the x-ms-documentdb-partitionkey header.

An annoyingly misleading error message!

Game of the Year 2014

Winner: Forza Horizon 2
Runner Up: The Last of Us Remastered

image_41088_fit_940

There’s been a lot of negativity in the gaming world this year about the next-gen consoles and how they are yet to deliver and while I’ve not played anything game-changing (pun not intended) I’ve had plenty of great gaming experiences over the last 12 months including Wolfenstein, Elite: Dangerous, Monument Valley, Far Cry 4, Alien: Isolation, Diablo III, Dragon Age and Mario Kart.

However my winner and runner up stood head and shoulders above those other games with only a hair breadth between them. Forza Horizon 2 is the first time I’ve whooped and hollered while playing a driving game since Daytona in the arcade and the most satisfying since the Project Gotham series to which this feels, to me, like a spiritual successor. For me it strikes the perfect balance between arcade and simulation with a driving model that sends me into a trance like state while hours slip by. I’ve recently tucked into the Storm Island DLC and some of the tracks are incredible with masses of air and dirty rallying to go at. It all looks gorgeous too with beautiful scenery, incredible car models, and a steady framerate – it runs at 30fps but I’ve never seen it drop a beat and although I prefer racers to run at 60fps the nature of the driving means this isn’t particularly detrimental to the experience and I soon got used to it.

Despite owning a PS3 somehow I missed The Last of Us on that console and eagerly awaited it in remastered form – I was not disappointed. I can’t remember ever being so gripped by a videogames story and characters before and from beginning to end I was desperate to see what would happen next yet dreading something bad happening to Ellie and Joel. The pacing was fantastic mixing up quiet and frantic moments and the occasional scare to get the pulse going. When the game finished I was both sad to see the story finish but satisfied that the story was complete – I hope Naughty Dog doesn’t cheapen the experience by revisiting Ellie and Joel. The DLC was also fantastic and I loved the parts without combat, really pulled on the heartstrings as throughout you know where thats going. It’s a beautiful game throughout being both technically solid and visually arresting with great art direction and the gameplay was, for me, just the right balance of stealth and combat.

As I say literally only a hairs breadth seperates those two games and on another day in a different mood I could easily swap them round. I think Forza Horizon 2 just pips it because of it’s sheer scale – I’ve driven a ridiculous number of miles and am still enjoying it as much as I did when I first started with it.

Special mentions also need to go to Alien: Isolation for the sheer level of terror it generates (I still haven’t completed it due to fear and dread!) and Elite: Dangerous for being mesmerising even while in beta.

Azure by Default

When I first started with Azure it only existed in PaaS form and had a very limited set of services compared to the rich variety available now. Adopting Azure at the time was seen as something as a risk, even within a heavy C# / .Net shop, and my first use of it was on a carefully targeted project – one on which I wasn’t betting the bank so to speak.

Over the last few years the platform has matured significantly adding additional PaaS features and IaaS features along the way and proven to be robust, reliable, cost effective and flexible in the development and operation of real systems running real customers. It’s done what it says on the tin and the people I have worked with who also have significant Azure experience largely say the same.

As such it’s been interesting to observe my own corresponding shift in behaviour over the last 12 months and throughout 2014 in particular. When I started on this journey back in 2011 I would have spoken of it in terms of interest and caution. Throughout late 2012 and 2013 I would have spoken of it as being an excellent option to be considered for many systems. Finally leading me to today where in the last few weeks I have found myself recommending it as the “default choice”.

By this I don’t mean it’s the only tool for every job but it’s the platform I now look to first for greenfield development and then look for reasons as to why it might not be a good fit, drilling into those reasons hard as the benefits of the platform are so great. The kind of thing that can make me look elsewhere are regulatory or compliance considerations, or a peculiar or edge case technical requirement.

It’s been a fascinating journey and still is, at this point I consider Azure to be amongst the best things Microsoft have done, right up there with C#, it’s a massively enabling technology. If you’ve not looked at it yet, and particularly if you’re a .Net / Microsoft developer, you really should.

Thanks, and a tip for people looking for help

Firstly let me just thank those who have taken the time to contribute to these projects on GitHub and also to those who have participated in discussions here. I really appreciate it.

And a quick tip for those looking to get help on any of the open source code related to this blog: best thing is to log a ticket on GitHub. That’s more likely to get my attention but also there are others chipping in on GitHub with help and advice and posting fixes and upgrades to the code in response.

I’ll keep checking here but I have an approval process in place due to spam and if I go on holiday (as I just did!) that can lead to lengthy delays.

Hope that helps and thanks again – really appreciate the support, help and interest.

 

Contact

  • If you're looking for help with C#, .NET, Azure, Architecture, or would simply value an independent opinion then please get in touch here or over on Twitter.

Recent Posts

Recent Tweets

Recent Comments

Archives

Categories

Meta

GiottoPress by Enrique Chavez