I’ve done a lot of work with Cosmos over the last year as a document database and generally found it to be a rock solid experience, it does what it says on the tin, and so when I found myself with a project that was a great fit for a graph database my first port of call was Cosmos DB.
I’d done a little work with it as a Graph API but as this was a new project I visited the Azure website to refresh myself on using it as a Graph from .NET and found that Microsoft are now recommending that the Tinkerpop Gremlin .NET library be used from .NET. There are some pieces on the old Microsoft.Azure.Graphs package but it never made it out of preview and the direction of travel looks to be elsewhere.
While Gremlin .NET is easy to get started with if you try and use this in a realistic sense you quickly run into a couple of serious limitations due to it’s current design around error and response handling. It seems to be designed to support console applications rather than real world services:
- Vendor specific attributes of the responses such as RU costs, communicated as header values, are hidden from you.
- Errors from the server are presented only as text messages. Rather than expose status codes and interpretable values the Gremlin .NET library first converts these into messages designed for consumption by people. To interpret the server error you need to parse these strings.
The above two issues combine into a very unfortunate situation for a real world Cosmos Graph API application: when Cosmos rate limits you it returns a 500 error to the caller with the 429 error being communicated in a x-ms-status-code header. This doesn’t play well with resilience libraries such as Polly as you end up having to fish around in the response text for keywords.
I initially raised this as a documentation issue on GitHub and Microsoft have confirmed they are moving to open source libraries and working to improve them but best I can tell, today, for the moment you’ve got two choices:
- Continue to use the Microsoft.Azure.Graphs package – I’ve not used this in anger but I understand it has issues of its own related to client side performance and is a bit of a dead end.
- Use Gremlin.NET and work around the issues.
For the time being I’ve opted to go with option (2) as I don’t want to unpick an already obsolete package from new code. To support this I’ve forked the Gremlin.NET library and introduced a couple of changes that allow attributes and response codes to be inspected for regular requests and for exceptions. I’ve done this in a none breaking way – you should be able to replace the official Gremlin.NET package with this replacement and your code should continue to work just fine but you can more easily implement resilience patterns. You can find it on GitHub here.
If I was designing the API from fresh to work in a real world situation I would probably expose a different API surface – at the moment I really have followed the path of least none-breaking resistance in terms of getting these things visible. That makes me uncertain as to whether or not this is an appropriate Pull Request to submit – I probably will, if nothing else hopefully that will start a conversation.