Category: Uncategorized

Azure Data Factory – Mapping Bug

This is something to be aware of as its effects can be very very subtle.

As part of a project for a client I have an Azure Data Factory that picks up data from various sources and moves it into a reporting database. Their are various types of copy operation going on but a common one is to take a data source and execute it via a stored procedure passing it in as a table parameter. You set up column mappings from the source to the target. I had this table type defined for a parameter to my ingestion stored procedure:

create type SessionType as table(
    [Id]                uniqueidentifier not null primary key nonclustered,
    [ProgrammeId]       UNIQUEIDENTIFIER not null,
    [StartDateTimeUtc]  datetime2        not null,
    [VolunteerId]       uniqueidentifier not null,
    [SessionCancelled]  bit              not null,
    [LastModified]      binary(8)        not null,
    [VolunteerRating]   int              null,
    [VolunteerComments] nvarchar(1025)   null
)

And a mapping set up from the source to this table as follows:

My ingestion procedure ran ok (it does a merge) but I was getting weird downstream results: data didn’t seem to be correlating as we expected. On back and forthing between the data sources I realised that the VolunteerId and the ProgrammeId were switched – the target VolunteerId was getting the source ReadingProgrammeId and the target ProgrammeId was getting the source VolunteerId.

I’d edited this so wandered if their was some weird caching going on or if the publish hadn’t really published so I made a change to try and force things – plus I’d run out of ideas. I couldn’t see a thing wrong with any of the SQL. I removed the two mappings and added them at the end:

On rerunning my data factory I found I now got an error. An issue trying to insert a datetime2 type into a uniqueidentifier column. The penny dropped. Despite the GUI, despite the tooling, despite the ARM definition the data factory is not using the column names – its merrily ignoring them and using order of the columns in the schema table type definition for targets.

I verified this by setting up a mapping based on the order of columns in the table type:

That fixed things and my downstream systems can now make sense of the data.

Hopefully they’ll get this fixed as unless you get a type clash its pretty dangerous.

Function Monkey for F#

Over the last couple of weeks I’ve been working on adapting Function Monkey so that it feels natural to work with in F#. The driver for this is that I find myself writing more and more F# and want to develop the backend for a new app in it and run it on Azure Functions.

I’m not going to pretend its pretty under the covers but its starting to take shape and I’m beginning to use it in my new backend and so now seemed like a good time to write a little about it by walking through putting together a simple ToDo style API that saves data to CosmosDB.

Declaring a Function App

As ever you’ll need to begin by creating a new Azure Function app in the IDE / editor of your choice. Once you’ve got that empty starting point you’ll need to two NuGet package FunctionMonkey.FSharp to the project (either with Paket or Nuget):

FunctionMonkey.FSharp
FunctionMonkey.Compiler

This is currently in alpha and so you’ll need to enable pre-release packages and add the following NuGet repository:

https://www.myget.org/F/functionmonkey-beta/api/v3/index.json

Next by create a new module called EntryPoint that looks like this:

namespace FmFsharpDemo
open AccidentalFish.FSharp.Validation
open System.Security.Claims
open FunctionMonkey.FSharp.Configuration
open FunctionMonkey.FSharp.Models

module EntryPoint =
    exception InvalidTokenException
    
    let validateToken (bearerToken:string) =
        match bearerToken.Length with
        | 0 -> raise InvalidTokenException
        | _ -> new ClaimsPrincipal(new ClaimsIdentity([new Claim("userId", "2FF4D861-F9E3-4694-9553-C49A94D7E665")]))
    
    let isResultValid (result:ValidationState) =
        match result with
        | Ok -> true
        | _ -> false
                                    
    let app = functionApp {
        // authorization
        defaultAuthorizationMode Token
        tokenValidator validateToken
        claimsMappings [
            claimsMapper.shared ("userId", "userId")
        ]
        // validation
        isValid isResultValid
        // functions
        httpRoute "version" [
            azureFunction.http (Handler(getApiVersion), Get)
        ]
    }

Ok. So what’s going on here? We’ll break it down block by block. We’re going to demonstrate authorisation using a (pretend) bearer token and so we begin by creating a function that can validate a token:

exception InvalidTokenException

let validateToken (bearerToken:string) =
    match bearerToken.Length with
    | 0 -> raise InvalidTokenException
    | _ -> new ClaimsPrincipal(new ClaimsIdentity([new Claim("userId", "2FF4D861-F9E3-4694-9553-C49A94D7E665")]))

This is our F# equivalent of the ITokenValidator interface in the C# version. In this case we take valid to mean any string of length in the authorization header and if the token is valid then we return a ClaimsPrincipal. Again here we just return a made up principal. In the case of an invalid token we simply raise an exception – Function Monkey will translate this to a 401 HTTP status.

We’re going to validate the inputs to our functions using my recently released validation framework. Function Monkey for F# supports any validation framework but as such you need to tell it what constitutes a validation failure and so next we create a function that is able to do this:

let isResultValid result = match result with | Ok -> true | _ -> false

Finally we declare our Function App itself:

let app = functionApp {
    // authorization
    defaultAuthorizationMode Token
    tokenValidator validateToken
    claimsMappings [
        claimsMapper.shared ("userId", "userId")
    ]
    // validation
    isValid isResultValid
    // functions
    httpRoute "version" [
        azureFunction.http (Handler(fun () -> "1.0.0"), Get, authorizationMode=Anonymous)
    ]
}

We declare our settings (and optionally functions) inside a functionApp block that we have to assign to a public member on the module so that the Function Monkey compiler can find your declaration.

Within the block we start by setting up our authorisation to use token validation (line 3) and instruct it to use the token validator function we created earlier (line 4). In lines 5 to 7 we then set up a claims mapping which will set userId on any of our record types associated with functions to the value of the userId claim. You can also set mappings to specific command type property like in the C# version.

On line 9 we tell Function Monkey to use our isResultValid function to determine if a validation results constitutes success of failure.

Then finally on line 11 we declare a HTTP route and a function within it. If you’re familiar with the C# version you can see here that we no longer use commands and command handlers – instead we use functions and their input parameter determines the type of the model being passed into the Azure Function and their return value determines the output of the Azure Function. In this case the function has no parameters and returns a string – a simple API version. We set this specific function to not require authorisation.

Finally lets add a host.json file to remove the auto-prefixing of api to routes (this causes problems with things like Open API output):

{
  "version": "2.0",
  "extensions": {
    "http": {
      "routePrefix": ""
    }
  }
}

If we run this now then in PostMan we should be able go call the endpoint http://localhost:7071/version and receive the response “1.0.0”.

Building our ToDo API

If you’re familiar with Function Monkey for C# then at this point you might be wandering where the rest of the functions are. We could declare them all here like we would in C# but the F# version of Function Monkey allows functions to be declared in multiple modules so that the functions can be located close to the domain logic and to avoid a huge function declaration block.

To get started create a new module called ToDo and we’ll begin by creating a type to model our to do items – we’ll also use this type for updating out to do items:

type ToDoItem =
    {
        id: string
        title: string
        complete: bool
    }

Next we’ll declare a type for adding a to do item:

type AddToDoItemCommand =
    {
        userId: string
        title: string
        isComplete: bool
    }

And finally an type that represents querying to find an item:

type GetToDoItemQuery =
    {
        id: string
    }

Next we’ll declare our validations for these models:

let withIdValidations = [
   isNotEmpty
   hasLengthOf 36
]

let withTitleValidations = [
    isNotEmpty
    hasMinLengthOf 1
    hasMaxLengthOf 255
]

let validateGetToDoItemQuery = createValidatorFor<GetToDoItemQuery>() {
    validate (fun q -> q.id) withIdValidations
}
    
let validateAddToDoItemCommand = createValidatorFor<AddToDoItemCommand>() {
    validate (fun c -> c.userId) withIdValidations
    validate (fun c -> c.title) withTitleValidations
}

let validateToDoItem = createValidatorFor<ToDoItem>() {
    validate (fun c -> c.id) withIdValidations
    validate (fun c -> c.title) withTitleValidations
    validate (fun c -> c.owningUserId) withIdValidations
}

Ok. So now we need to create functions for adding an item to the database and another for getting one from it. We’ll use Azure CosmosDB as a data store and I’m going to assume you’ve set one up. Our add function needs to accept a record of type AddToDoItemCommand and return a new record of type ToDoItem assigning properties as appropriate:

let addToDoItem command =
    {
        id = Guid.NewGuid().ToString()
        owningUserId = command.userId
        title = command.title
        isComplete = command.isComplete
    }

The user ID on our command will have been populated by the claims binding. We don’t write the item to Cosmos here, instead we’re going to use an output binding shortly.

Next our function for reading a to do item from Cosmos:

let getToDoItem query =
    CosmosDb.reader<ToDoItem> <| query.id

CosmosDb.reader is a super simple helper function I created:

namespace FmFsharpDemo
open Microsoft.Azure.Cosmos
open System

module CosmosDb =
    let cosmosDatabase = "testdatabase"
    let cosmosCollection = "colToDoItems"
    let cosmosConnectionString = Environment.GetEnvironmentVariable("cosmosConnectionString")
    
    let reader<'t> id =
        async {
            use client = new CosmosClient(cosmosConnectionString)
            let container = client.GetContainer(cosmosDatabase, cosmosCollection)
            let! response = container.ReadItemAsync<'t>(id, new PartitionKey(id)) |> Async.AwaitTask
            return response.Resource
        }
    

If we inspect the signatures for our two functions we’ll find that addToDoItem has a signature of AddToDoItemCommand -> ToDoItem and getToDoItem has a signature of GetToDoItemQuery -> Async<ToDoItem>. One of them is asynchronous and the other is not – Function Monkey for F# supports both forms. We’re not going to create a function for updating an existing item to demonstrate handler-less functions (though as we’ll see we’ll duck a slight issue for the time being!).

There is one last step we’re going to take before we declare our functions and that’s to create a curried output binding function:

let todoDatabase =
    cosmosDb cosmosCollection cosmosDatabase

In the above cosmosDb is a function that is part of the Function Monkey output binding set and it takes three parameters – the collection / container name, the database name and finally the function that the output binding is being applied to. We’re going to use it multiple times so we create this curried function to make our code less repetitive and more readable.

With all that we can now declare our functions block:

let toDoFunctions = functions {
    httpRoute "api/v1/todo" [
        azureFunction.http (AsyncHandler(getToDoItem),
                            verb=Get, subRoute="/{id}",
                            validator=validateGetToDoItemQuery)
        azureFunction.http (Handler(addToDoItem),
                            verb=Post,
                            validator=validateAddToDoItemCommand,
                            returnResponseBodyWithOutputBinding=true)
            |> todoDatabase
        azureFunction.http (NoHandler, verb=Put, validator=validateToDoItem)
            |> todoDatabase
    ]
}

The functions block is a subset of the functionApp block we saw earlier and can only be used to define functions – shared configuration must go in the functionApp block.

Hopefully the first, GET verb, function is reasonably self-explanatory. The AsyncHandler case instructs Function Monkey that this is an async function and we assign a validator with the validator option.

The second function, for our POST verb, introduces a new concept – output bindings. We pipe the output of azureFunction.http to our curried output binding and this will result in a function being created that outputs to Cosmos DB. Because we’re using the Cosmos output binding we also need to add the Microsoft.Azure.WebJobs.Extensions.CosmosDB package to our functional project. We set the option returnResponseBodyWithOutputBinding to true so that as well as sending the output of our function to the output trigger we also return it as part of the HTTP response (this is optional as you can imagine in a more complex scenario that could leak data).

Finally for the third function our PUT verb also uses an output binding but this doesn’t have a handler at all, hence the NoHandler case. In this scenario the command that is passed in, once validated, is simply passed on as the output of the function. And so in this instance we can PUT a to do item to our endpoint and it will update the appropriate entry in Cosmos. (Note that for the moment I have not answered the question as to how to prevent one user from updating another users to do items – our authorisation approach is currently limited and I’ll come back to that in a future post).

Trying It Out

With all that done we can try this function app out in Postman. If we begin by attempting to add an invalid post to our POST endpoint, say with an empty title, we’ll get a 400 status code returned and a response as follows:

{
  "case": "Errors",
  "fields": [
    [
      {
        "message": "Must not be empty",
        "property": "title",
        "errorCode": "isNotEmpty"
      },
      {
        "message": "Must have a length no less than 1",
        "property": "title",
        "errorCode": "hasMinLengthOf"
      }
    ]
  ]
}

Now if we run it with a valid payload we will get:

{
  "id": "09482e8d-41aa-4c25-9552-b7b05bf0a787",
  "owningUserId": "2FF4D861-F9E3-4694-9553-C49A94D7E665",
  "title": "Buy underpants",
  "isComplete": false
}

Next Steps

These are with me really – I need to continue to flesh out the functionality which at this point essentially boils down to expanding out the computation expression and its helpers. I also need to spend some time refactoring aspects of Function Monkey. I’ve had to dig up and change quite a few things so that it can work in this more functional manner as well as continue to support the more typical C# patterns.

Then of course there is documentation!

Multi-Model Azure Cosmos DB – Running SQL (Geospatial) Queries On a Graph

One of the major selling points for Microsoft’s Azure Cosmos DB is that its a multi-model database – you can use it as a:

  • Simple key/value pair store through the Table API
  • Document database through the SQL API
  • Graph database through the Graph (Gremlin) API
  • MongoDB database
  • Cassandra database

Underneath these APIs Cosmos uses it’s Atom-Record-Sequence (ARS) type system to manage and store the data which has the less well publicized benefit of allowing you to use different APIs to access the same data.

While I’ve not explored all the possibilities I’d like to demonstrate the potential that this approach opens up by running geo-spatial queries (longitude and latitude based) against a Graph database that is primarily used to model social network relationships.

Lets assume you’re building a social network to connect individuals in local communities and you’re using a graph database to connect people together. In this graph you have vertexes of type ‘person’ and edges of type ‘follows’ and when someone new joins your network you want to suggest people to them who live within a certain radius of their location (a longitude and latitude derived from a GPS or geocoded address). In this example we’re going to do this by constructing the graph based around social connections using Gremlin .NET and using the geospatial queries available to the SQL API (we’ll use SP_DISTANCE) to interrogate it based on location.

Given a person ID, longitude and latitude firstly lets look at how we add someone into the graph (this assumes you’ve added the Gremlin .NET package to your solution):

public Task CreateNode(string personId)
{
    const string cosmosEndpoint = "yourcosmosdb.gremlin.cosmosdb.azure.com";
    string username = $"/dbs/yoursocialnetwork/colls/relationshipcollection"; // the form of /dbs/DATABASE/colls/COLLECTION
    const string cosmosAuthKey = "yourauthkey";

    GremlinServer gremlinServer = new GremlinServer(cosmosEndpoint, 443, true, username, cosmosAuthKey);
    using (GremlinClient client = new GremlinClient(gremlinServer, new GraphSON2Reader(),
        new GraphSON2Writer(), GremlinClient.GraphSON2MimeType))
    {
        Dictionary<string, object> arguments = new Dictionary<string, object>
        {
            {"personId", personId}
        };

        await client.SubmitAsync("g.addV('person').property('id', personId)", arguments);
    }
}

If you run this code against a Cosmos DB Graph collection it will create a vertex with the label of person and assign it the specified id.

We’ve not yet got any location data attached to this vertex and that’s because Gremlin.NET and Cosmos do not (to the best of my knowledge) allow us to set a property with the specific structure required by the geospatial queries supported by SQL API. That being the case we’re going to use the SQL API to load this vertex as a document and attach a longitude and latitude. The key to this is to use the document endpoint. If you look in the code above you can see we have a Gremlin endpoint of:

yourcosmosdb.gremlin.cosmosdb.azure.com

If you attempt to run SQL API queries against that endpoint you’ll encounter errors, the endpoint we need to use is:

yourcosmosdb.documents.azure.com

To keep things simple we’ll use the LINQ support provided by the DocumentDB client but rather than define a class that represents the document we’ll use a dictionary of strings and objects. In the multi-model world we have to be very careful not to lose properties required by the other model and by taking this approach we don’t need to inspect and carefully account for each property (and ongoing change!). If we don’t do this we are liable to lose data at some point.

All that being the case we can attach our location property using code like the below:

public static async Task AddLocationToDocument(this DocumentClient documentClient, Uri documentCollectionUri, string id, double longitude, double latitude)
{
    DocumentClient documentClient = new DocumentClient(new Uri("https://yourcosmosdb.documents.azure.com"));
    Uri collectionUri = UriFactory.CreateDocumentCollectionUri("db","coll");
    IQueryable<Dictionary<string,object>> documentQuery = documentClient.CreateDocumentQuery<Dictionary<string,object>>(
            collectionUri ,
            new FeedOptions { MaxItemCount = 1 })
        .Where(x => (string)x["id"] == id);
    PersonDocument document = documentQuery.ToArray().SingleOrDefault();
    document["location"] = new Point(longitude, latitude);
    Uri documentUri = UriFactory.CreateDocumentUri(
        Constants.Storage.Cosmos.Database,
        Constants.Storage.Cosmos.RelationshipsCollection,
        id);
    await documentClient.ReplaceDocumentAsync(documentUri, document);
}

If you were now to inspect this vertex in the Azure portal you won’t see the Location property in the graph explorer – its property format is not one known to Gremlin:

However if you open the collection in Azure Storage Explorer (another example of multi-model usage) you will see the vertexes (and edges) exposed as documents and will see the location attached in the JSON:

{
    "location": {
        "type": "Point",
        "coordinates": [
            -6.45436,
            49.12892
        ]
    },
    "id": "1234",
    "label": "person",
    "_rid": "...",
    "_self": "...",
    "_etag": "\"...\"",
    "_attachments": "attachments/",
    "_ts": ...
}

Ok. So at this point we’ve got a person in our social network graph. Lets imagine we’re about to add another person and we want to search for nearby people. We’ll do this using a SQL API query to find people within 30km of a given location (I’ve stripped out some of the boilerplate from this):

const double rangeInMeters = 30000;
IQueryable<Dictionary<string,object>> query = _documentClient.Value.CreateDocumentQuery<Dictionary<string, object>>(
    collectionUri,
    new FeedOptions { MaxItemCount = 1000 })
    .Where(x => (string)x["label"] == "person" &&
                ((Point)x["location"]).Distance(new Point(location.Longitude, location.Latitude)) < rangeInMeters);
var inRange = query.ToArray();

From here we can iterate our inRange array and build a Gremlin query to attach our edges roughly of the form:

g.V('1234').addE('follows').to(g.V('5678'))

Closing Thoughts

Being able to use multiple models to query the same data brings a lot of expressive power to Cosmos DB though it’s still important to choose the model that best represents the primary workload for your data and understand how the different queries behave in respect to RUs and data volumes.

It’s also a little clunky working in the two models and its quite easy, particularly if performing updates, to break things. I expect as Cosmos becomes more mature that this is going to be cleaned up and I think Microsoft will talk about these capabilities more.

Finally – its worth opening up a graph database with Azure Storage Explorer and inspecting the documents, you can see how Cosmos manages vertexes and edges and it gives a more informed view as to how graph queries consume RUs.

 

 

 

Cosmos DB x-ms-partitionkey Error

A small tip but one might save you some time – if you’re trying to run Cosmos DB queries via the REST API you might encounter the following Bad Request error:

The partition key supplied in x-ms-partitionkey header has fewer components than defined in the the collection.

What you actually need to specify is the partition key for your collection using the x-ms-documentdb-partitionkey header.

An annoyingly misleading error message!

Game of the Year 2014

Winner: Forza Horizon 2
Runner Up: The Last of Us Remastered

image_41088_fit_940

There’s been a lot of negativity in the gaming world this year about the next-gen consoles and how they are yet to deliver and while I’ve not played anything game-changing (pun not intended) I’ve had plenty of great gaming experiences over the last 12 months including Wolfenstein, Elite: Dangerous, Monument Valley, Far Cry 4, Alien: Isolation, Diablo III, Dragon Age and Mario Kart.

However my winner and runner up stood head and shoulders above those other games with only a hair breadth between them. Forza Horizon 2 is the first time I’ve whooped and hollered while playing a driving game since Daytona in the arcade and the most satisfying since the Project Gotham series to which this feels, to me, like a spiritual successor. For me it strikes the perfect balance between arcade and simulation with a driving model that sends me into a trance like state while hours slip by. I’ve recently tucked into the Storm Island DLC and some of the tracks are incredible with masses of air and dirty rallying to go at. It all looks gorgeous too with beautiful scenery, incredible car models, and a steady framerate – it runs at 30fps but I’ve never seen it drop a beat and although I prefer racers to run at 60fps the nature of the driving means this isn’t particularly detrimental to the experience and I soon got used to it.

Despite owning a PS3 somehow I missed The Last of Us on that console and eagerly awaited it in remastered form – I was not disappointed. I can’t remember ever being so gripped by a videogames story and characters before and from beginning to end I was desperate to see what would happen next yet dreading something bad happening to Ellie and Joel. The pacing was fantastic mixing up quiet and frantic moments and the occasional scare to get the pulse going. When the game finished I was both sad to see the story finish but satisfied that the story was complete – I hope Naughty Dog doesn’t cheapen the experience by revisiting Ellie and Joel. The DLC was also fantastic and I loved the parts without combat, really pulled on the heartstrings as throughout you know where thats going. It’s a beautiful game throughout being both technically solid and visually arresting with great art direction and the gameplay was, for me, just the right balance of stealth and combat.

As I say literally only a hairs breadth seperates those two games and on another day in a different mood I could easily swap them round. I think Forza Horizon 2 just pips it because of it’s sheer scale – I’ve driven a ridiculous number of miles and am still enjoying it as much as I did when I first started with it.

Special mentions also need to go to Alien: Isolation for the sheer level of terror it generates (I still haven’t completed it due to fear and dread!) and Elite: Dangerous for being mesmerising even while in beta.

Azure by Default

When I first started with Azure it only existed in PaaS form and had a very limited set of services compared to the rich variety available now. Adopting Azure at the time was seen as something as a risk, even within a heavy C# / .Net shop, and my first use of it was on a carefully targeted project – one on which I wasn’t betting the bank so to speak.

Over the last few years the platform has matured significantly adding additional PaaS features and IaaS features along the way and proven to be robust, reliable, cost effective and flexible in the development and operation of real systems running real customers. It’s done what it says on the tin and the people I have worked with who also have significant Azure experience largely say the same.

As such it’s been interesting to observe my own corresponding shift in behaviour over the last 12 months and throughout 2014 in particular. When I started on this journey back in 2011 I would have spoken of it in terms of interest and caution. Throughout late 2012 and 2013 I would have spoken of it as being an excellent option to be considered for many systems. Finally leading me to today where in the last few weeks I have found myself recommending it as the “default choice”.

By this I don’t mean it’s the only tool for every job but it’s the platform I now look to first for greenfield development and then look for reasons as to why it might not be a good fit, drilling into those reasons hard as the benefits of the platform are so great. The kind of thing that can make me look elsewhere are regulatory or compliance considerations, or a peculiar or edge case technical requirement.

It’s been a fascinating journey and still is, at this point I consider Azure to be amongst the best things Microsoft have done, right up there with C#, it’s a massively enabling technology. If you’ve not looked at it yet, and particularly if you’re a .Net / Microsoft developer, you really should.

Thanks, and a tip for people looking for help

Firstly let me just thank those who have taken the time to contribute to these projects on GitHub and also to those who have participated in discussions here. I really appreciate it.

And a quick tip for those looking to get help on any of the open source code related to this blog: best thing is to log a ticket on GitHub. That’s more likely to get my attention but also there are others chipping in on GitHub with help and advice and posting fixes and upgrades to the code in response.

I’ll keep checking here but I have an approval process in place due to spam and if I go on holiday (as I just did!) that can lead to lengthy delays.

Hope that helps and thanks again – really appreciate the support, help and interest.

 

Contact

  • If you're looking for help with C#, .NET, Azure, Architecture, or would simply value an independent opinion then please get in touch here or over on Twitter.

Recent Posts

Recent Tweets

Recent Comments

Archives

Categories

Meta

GiottoPress by Enrique Chavez