Category: Azure Cosmos DB

Avoiding Gremlin Injection Attacks with Azure Cosmos DB

I’ve written previously about some of the issues with using Cosmos DB as a graph database from .NET. One of the more serious issues, I think, is that the documentation doesn’t really demonstrate how to avoid an injection attack when using Gremlin as it presents examples using hard coded strings which are then just picked up and run through the Gremlin.NET library:

// Gremlin queries that will be executed.
private static Dictionary<string, string> gremlinQueries = new Dictionary<string, string>
    { "Cleanup",        "g.V().drop()" },
    { "AddVertex 1",    "g.addV('person').property('id', 'thomas').property('firstName', 'Thomas').property('age', 44)" },
    { "AddVertex 2",    "g.addV('person').property('id', 'mary').property('firstName', 'Mary').property('lastName', 'Andersen').property('age', 39)" },
    { "AddVertex 3",    "g.addV('person').property('id', 'ben').property('firstName', 'Ben').property('lastName', 'Miller')" },
    { "AddVertex 4",    "g.addV('person').property('id', 'robin').property('firstName', 'Robin').property('lastName', 'Wakefield')" },
    { "AddEdge 1",      "g.V('thomas').addE('knows').to(g.V('mary'))" },
    { "AddEdge 2",      "g.V('thomas').addE('knows').to(g.V('ben'))" },
    { "AddEdge 3",      "g.V('ben').addE('knows').to(g.V('robin'))" },
    { "UpdateVertex",   "g.V('thomas').property('age', 44)" },
    { "CountVertices",  "g.V().count()" },
    { "Filter Range",   "g.V().hasLabel('person').has('age', gt(40))" },
    { "Project",        "g.V().hasLabel('person').values('firstName')" },
    { "Sort",           "g.V().hasLabel('person').order().by('firstName', decr)" },
    { "Traverse",       "g.V('thomas').out('knows').hasLabel('person')" },
    { "Traverse 2x",    "g.V('thomas').out('knows').hasLabel('person').out('knows').hasLabel('person')" },
    { "Loop",           "g.V('thomas').repeat(out()).until(has('id', 'robin')).path()" },
    { "DropEdge",       "g.V('thomas').outE('knows').where(inV().has('id', 'mary')).drop()" },
    { "CountEdges",     "g.E().count()" },
    { "DropVertex",     "g.V('thomas').drop()" },

Focusing in on one of the add vertex examples and how it might be executed with the Gremlin.NET library:

private async Task BadExample()
    using (GremlinClient client = new GremlinClient(_gremlinServer, new GraphSON2Reader(),
        new GraphSON2Writer(), GremlinClient.GraphSON2MimeType))
        await client.SubmitAsync(
            "g.addV('person').property('id', 'thomas').property('firstName', 'Thomas').property('age', 44)");

We know from years of SQL that examples like this quickly become widespread injection prone pieces of code like the below, particularly if people are new to working with a new database (and in the case of graph databases and Gremlin – that’s most people):

private async Task BadExample(string firstName, int age)
    using (GremlinClient client = new GremlinClient(_gremlinServer, new GraphSON2Reader(),
        new GraphSON2Writer(), GremlinClient.GraphSON2MimeType))
        await client.SubmitAsync(
            $"g.addV('person').property('id', '{firstName.ToLower()}').property('firstName', '{firstName}').property('age', {age})");

The issue, if you’re not familiar with injection attacks, is that as a user I can enter a ‘ character in the input and break out to add my own code through a user interface that executes on the server – for example I could supply a firstName of:

James').property('myinjectedproperty','hahaha got ya

And I’ve managed to attach some data of my own choosing. Obviously I could do more nefarious things too.

Fortunately Gremlin does support parameterised queries and we can rewrite the code above more safely to look like this and leave the libraries and database to take care of this:

private async Task BetterExample(string firstName, int age)
    using (GremlinClient client = new GremlinClient(_gremlinServer, new GraphSON2Reader(),
        new GraphSON2Writer(), GremlinClient.GraphSON2MimeType))
        Dictionary<string, object> arguments = new Dictionary<string, object>
            { "firstNameAsId", firstName.ToLower() },
            { "firstName", firstName },
            { "age", age }
        await client.SubmitAsync(
            $"g.addV('person').property('id', firstNameAsId).property('firstName', firstName).property('age', age)", arguments);

With the uptake of Cosmos and Graph databases being new to most people I really wish the Cosmos team would update these docs with a security first mindset and its something I’ve fed back to them previously. Leaving the documentation as it stands is almost certainly going to lead to more insecure code being written than would otherwise be the case.

I’ll probably drop out a few short Cosmos posts over the next few days – I’m doing a lot of (quite interesting!) work with it at the moment.

Azure Cosmos DB and its Perplexing Pricing Problems

I’ve had a lot of success with Azure Cosmos DB and have regularly tweeted about how “it does what it says on the tin” – and while the Graph support has been a bit of a bumpy .NET developer experience other than that it, so far (admittedly at limited volume in real world scenarios), looks good too. It’s certainly looking like a success for Microsoft though its had a massive marketing push over the last year.

However Microsoft seem to be continually undermining it with ridiculous pricing levels that put it out of reach of smaller more cost conscious customers that prevent it from being a truly end to end scalable database solution for the cloud which is disappointing given how it is usually pitched as the defacto database for Azure.

Until this weeks Build event pricing was governed by the amount of resource units (RUs) you applied to a collection (or Graph). Collections essentially being sets of documents that share a partition key.

There are two types of collection: fixed and unlimited. Fixed is:

  • Un-partitioned (no partition key is specified).
  • The minimum number of RUs you can allocate is 400.
  • The maximum is 10000 RUs.
  • Maximum storage of 10Gb.

Whereas unlimited is:

  • Partitioned – you need to supply a partition key when you create the collection and structure your code to make use of it.
  • Pricing starts at 1000RUs (down from its previous entry point of 2500RUs).
  • There is no maximum limit although if you want to go beyond 100,000 you might need to call Azure Support.
  • There is no maximum limit for storage.

To put those into financial context fixed pricing begins at around $23 per month per collection and unlimited at $58 per month per collection.

On its own that’s pretty reasonable but remember this is per collection so if you have multiple types of document that’s potentially multiple collections. I’ve heard two workarounds suggested for this. The first is to store multiple document types inside a single collection – the problem with this is that when you design a storage strategy for a solution often you know you are going to outgrow this kind of bodge so you’re essentially being forced to incur technical debt from day one (you’re just trading hosting costs for development cost). The second is to “take advantage of our free plans” and while its great to see free plans available its hard not to feel like you’ve just set a timer running on a small explosive and have sat on it, plus combined with other service usage those plans might not be sufficient: Cosmos is unlikely to be running on its own. (As an side – why is this such an issue? Different document types often require different partition keys and strategies based on both their content and the characteristics of the queries you want to run against them – its one of the most important considerations for the storage strategy with a database like Cosmos)

Now, with the announcement at Build, we’re able to provision RUs at the database level allowing collections in that database to share them. To me that sounded great – being able to provision, say, a 2500RU database and have it sit over 10 collections that I can later pull out and provision with their own RU allocations as my needs grow. Unfortunately the devil was in the detail on this one and visiting the Azure pricing page the minimum RU level for a database has been set at 50000. That’s approximately $3000 per month and as Howard van Rooijen noted on Twitter “feels like you’re renting the cluster”. I completely agree and its not the first time in the last few months I’ve felt that Azure seemingly has an internal inability to operate at a granular level and that that is limiting its service.

Maybe Microsoft feel ignoring the cost conscious segment of the market is ok but that’s running the risk of alienating up and coming startups or turning them to competing services – the cloud is an increasingly competitive market and Microsoft are still the underdog. And with the advent of .NET Core and its cross platform capabilities it’s getting easier than ever to run .NET code on other vendors – and sometimes even better than on Azure. For example Azure Functions have come on in leaps and bounds but are still beset by problems that are not an issue on AWS Lambda.

In fact in the same space Cosmos plays, AWS have DynamoDB as a document database and Neptune as a graph database. I’m no expert in either but they seem to have far more proportional pricing models than Cosmos and increasingly look worth investigating. And that’s dangerous to Microsoft – I’ve been using both .NET and Azure since they respectively launched and so am pretty much in their developer heartland – but the more I’m pushed away by decisions like this the more likely it is that I, and others, jump in a more wholesale manner. Its really not the message they want to be leaving us with at one of their premier events.

I guess its a case of watch this space. This pricing model is in preview and there’s track record for back tracking on RU limits but getting down from 50000 RUs to something affordable – that’s going to be a massive change.

I’d love to see it happen.

Cosmos, Oh Cosmos – Graph API as a .NET Developer

I’ve done a lot of work with Cosmos over the last year as a document database and generally found it to be a rock solid experience, it does what it says on the tin, and so when I found myself with a project that was a great fit for a graph database my first port of call was Cosmos DB.

I’d done a little work with it as a Graph API but as this was a new project I visited the Azure website to refresh myself on using it as a Graph from .NET and found that Microsoft are now recommending that the Tinkerpop Gremlin .NET library be used from .NET. There are some pieces on the old Microsoft.Azure.Graphs package but it never made it out of preview and the direction of travel looks to be elsewhere.

While Gremlin .NET is easy to get started with if you try and use this in a realistic sense you quickly run into a couple of serious limitations due to it’s current design around error and response handling. It seems to be designed to support console applications rather than real world services:

  1. Vendor specific attributes of the responses such as RU costs, communicated as header values, are hidden from you.
  2. Errors from the server are presented only as text messages. Rather than expose status codes and interpretable values the Gremlin .NET library first converts these into messages designed for consumption by people. To interpret the server error you need to parse these strings.

The above two issues combine into a very unfortunate situation for a real world Cosmos Graph API application: when Cosmos rate limits you it returns a 500 error to the caller with the 429 error being communicated in a x-ms-status-code header. This doesn’t play well with resilience libraries such as Polly as you end up having to fish around in the response text for keywords.

I initially raised this as a documentation issue on GitHub and Microsoft have confirmed they are moving to open source libraries and working to improve them but best I can tell, today, for the moment you’ve got two choices:

  1. Continue to use the Microsoft.Azure.Graphs package – I’ve not used this in anger but I understand it has issues of its own related to client side performance and is a bit of a dead end.
  2. Use Gremlin.NET and work around the issues.

For the time being I’ve opted to go with option (2) as I don’t want to unpick an already obsolete package from new code. To support this I’ve forked the Gremlin.NET library and introduced a couple of changes that allow attributes and response codes to be inspected for regular requests and for exceptions. I’ve done this in a none breaking way – you should be able to replace the official Gremlin.NET package with this replacement and your code should continue to work just fine but you can more easily implement resilience patterns. You can find it on GitHub here.

If I was designing the API from fresh to work in a real world situation I would probably expose a different API surface – at the moment I really have followed the path of least none-breaking resistance in terms of getting these things visible. That makes me uncertain as to whether or not this is an appropriate Pull Request to submit – I probably will, if nothing else hopefully that will start a conversation.


  • If you're looking for help with C#, .NET, Azure, Architecture, or would simply value an independent opinion then please get in touch here or over on Twitter.

Recent Posts

Recent Tweets

Recent Comments




GiottoPress by Enrique Chavez