Month: November 2019

Azure Data Factory – Mapping Bug

This is something to be aware of as its effects can be very very subtle.

As part of a project for a client I have an Azure Data Factory that picks up data from various sources and moves it into a reporting database. Their are various types of copy operation going on but a common one is to take a data source and execute it via a stored procedure passing it in as a table parameter. You set up column mappings from the source to the target. I had this table type defined for a parameter to my ingestion stored procedure:

create type SessionType as table(
    [Id]                uniqueidentifier not null primary key nonclustered,
    [ProgrammeId]       UNIQUEIDENTIFIER not null,
    [StartDateTimeUtc]  datetime2        not null,
    [VolunteerId]       uniqueidentifier not null,
    [SessionCancelled]  bit              not null,
    [LastModified]      binary(8)        not null,
    [VolunteerRating]   int              null,
    [VolunteerComments] nvarchar(1025)   null
)

And a mapping set up from the source to this table as follows:

My ingestion procedure ran ok (it does a merge) but I was getting weird downstream results: data didn’t seem to be correlating as we expected. On back and forthing between the data sources I realised that the VolunteerId and the ProgrammeId were switched – the target VolunteerId was getting the source ReadingProgrammeId and the target ProgrammeId was getting the source VolunteerId.

I’d edited this so wandered if their was some weird caching going on or if the publish hadn’t really published so I made a change to try and force things – plus I’d run out of ideas. I couldn’t see a thing wrong with any of the SQL. I removed the two mappings and added them at the end:

On rerunning my data factory I found I now got an error. An issue trying to insert a datetime2 type into a uniqueidentifier column. The penny dropped. Despite the GUI, despite the tooling, despite the ARM definition the data factory is not using the column names – its merrily ignoring them and using order of the columns in the schema table type definition for targets.

I verified this by setting up a mapping based on the order of columns in the table type:

That fixed things and my downstream systems can now make sense of the data.

Hopefully they’ll get this fixed as unless you get a type clash its pretty dangerous.

Contact

  • If you're looking for help with C#, .NET, Azure, Architecture, or would simply value an independent opinion then please get in touch here or over on Twitter.

Recent Posts

Recent Tweets

Recent Comments

Archives

Categories

Meta

GiottoPress by Enrique Chavez