Author: James

C# Cloud Application Architecture – Commanding via a Mediator (Part 5)

Over the last 4 parts of this series we’ve taken a simple application built around a layered architecture and restructured it into an application based around dispatching queries and commands as state through a mediator.

We’ve seen many of the advantages this can bring to a codebase reducing repetition and allowing for a clear decomposition into business, or service, oriented modules.

In this final part I’ll demonstrate how this pattern can support an application through the various stages of it’s lifecycle. The early stages of a software development project are often susceptible to a high degree of change. If it’s a new product under development then the challenge is often around establishing market fit (be that internal or external) without burning through the entire budget. Additionally if the problem domain is new it’s likely that the first attempt at drawing out bounded contexts will contain errors and if the system is built as fully isolated components change can be expensive. In either case keeping the cost of development and change low in the early phases of the project can lead to much more effective use of a projects budget.

In the system we’ve been developing we’ve developed three sub-systems: a checkout, a shopping cart and a product store – essentially we have a modular monolith.

In this part we’re going to assume that we’re finding that our product store is coming under a lot of strain and we are going to pull it out into a micro-service so that we can scale it independently. And we’re going to make this change without altering any consuming business logic code at all.

In our system we make use of the store in two places through the dispatch of GetStoreProductQuery queries. Firstly it is represented in the primary API as an endpoint that can be called by clients in the ProductController class:

[Route("api/[controller]")]
public class ProductController : AbstractCommandController
{
    public ProductController(ICommandDispatcher dispatcher) : base(dispatcher)
    {
            
    }

    [HttpGet("{productId}")]
    [ProducesResponseType(typeof(StoreProduct), 200)]
    public async Task<IActionResult> Get([FromRoute] GetStoreProductQuery query) => await ExecuteCommand(query);
}

Secondly it is also used to provide validation of products within the handler for the AddToCartCommand in the AddToCartCommandHandler class:

public async Task<CommandResponse> ExecuteAsync(AddToCartCommand command, CommandResponse previousResult)
{
    Model.ShoppingCart cart = await _repository.GetActualOrDefaultAsync(command.AuthenticatedUserId);

    StoreProduct product = (await _dispatcher.DispatchAsync(new GetStoreProductQuery{ProductId = command.ProductId})).Result;

    if (product == null)
    {
        _logger.LogWarning("Product {0} can not be added to cart for user {1} as it does not exist", command.ProductId, command.AuthenticatedUserId);
        return CommandResponse.WithError($"Product {command.ProductId} does not exist");
    }
    List<ShoppingCartItem> cartItems = new List<ShoppingCartItem>(cart.Items);
    cartItems.Add(new ShoppingCartItem
    {
        Product = product,
        Quantity = command.Quantity
    });
    cart.Items = cartItems;
    await _repository.UpdateAsync(cart);
    return CommandResponse.Ok();
}

To make our change the first thing we need to do is to be able to execute our command inside a different host – we’ll use an Azure Function that accepts the ProductID required by ourGetStoreProductQuery query. The code for this function is shown below:

public static class GetStoreProduct
{
    private static readonly IServiceProvider ServiceProvider;
    private static readonly AsyncLocal<ILogger> Logger = new AsyncLocal<ILogger>();
        
    static GetStoreProduct()
    {
        IServiceCollection serviceCollection = new ServiceCollection();
        MicrosoftDependencyInjectionCommandingResolver resolver = new MicrosoftDependencyInjectionCommandingResolver(serviceCollection);
        ICommandRegistry registry = resolver.UseCommanding();
        serviceCollection.UseCoreCommanding(resolver);
        serviceCollection.UseStore(() => ServiceProvider, registry, ApplicationModeEnum.Server);
        serviceCollection.AddTransient((sp) => Logger.Value);
        ServiceProvider = resolver.ServiceProvider = serviceCollection.BuildServiceProvider();
    }

    [FunctionName("GetStoreProduct")]
    public static async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)]HttpRequest req, ILogger logger)
    {
        Logger.Value = logger;
        logger.LogInformation("C# HTTP trigger function processed a request.");
            
        IDirectCommandExecuter executer = ServiceProvider.GetService<IDirectCommandExecuter>();

        GetStoreProductQuery query = new GetStoreProductQuery
        {
            ProductId = Guid.Parse(req.GetQueryParameterDictionary()["ProductId"])
        };
        CommandResponse<StoreProduct> result = await executer.ExecuteAsync(query);
        return new OkObjectResult(result);
    }
}

Our static constructor sets up our IoC container (Azure Functions actually run on app service instances and you can share state between them – though their are few guarantees and you can debate at length how “serverless” this makes things – AWS Lambda is much the same) and should be fairly familiar code by now.

Our function entry point does something different – it creates an instance of our GetStoreProductQuery from the query parameters supplied but rather than dispatch it through the ICommandDispatcher interface we’ve seen before it executes it using a reference to a IDirectCommandExecuter resolved from our IoC container. This instructs the command framework to execute the command without any dispatch semantics – that means that any logging of dispatch portions of the command flow won’t be replicated by this function and it is slightly more efficient (it’s worth noting that you can dispatch again here if you need to – though generally you would take the approach I am showing here).

To support this new approach I’ve also made a change to the IServiceCollectionsExtensions UseStore registration method inside the so that it can be supplied an enum that determines how our command should be handled: in process (as we’ve been doing up until now), as a client of a remote service, or as a server (as we have done above). The enum is used to register the command in one of two ways and this is the key change to the existing change that enables us to remote the command:

if (applicationMode == ApplicationModeEnum.InProcess || applicationMode == ApplicationModeEnum.Server)
{
    commandRegistry.Register<GetStoreProductQueryHandler>();
}
else if (applicationMode == ApplicationModeEnum.Client)
{
    // this configures the command dispatcher to send the command over HTTP and wait for the result
    Uri functionUri = new Uri("http://localhost:7071/api/GetStoreProduct");
    commandRegistry.Register<GetStoreProductQuery, CommandResponse<StoreProduct>>(() =>
    {
        IHttpCommandDispatcherFactory httpCommandDispatcherFactory = serviceProvider().GetService<IHttpCommandDispatcherFactory>();
        return httpCommandDispatcherFactory.Create(functionUri, HttpMethod.Get);
    });
}

Both the in-process and server mode continue to register the handler as they have done before however when the application mode is set to client the registration takes a different form. Rather than register the handler we supply the type of the command and the type of the result as generic type parameters but then we setup a lambda that will resolve an instance of a IHttpCommandDispatcherFactory and create a HTTP dispatcher with the URI of the function and the HTTP verb to use. These interfaces can be found within the NuGet package AzureFromTheTrenches.Commanding.Http which I’ve added to the Store.Application project.

Registering in this way instructs the commanding system to dispatch the command using the, in this case, HTTP dispatcher rather than attempt to execute it locally. All the other framework features around the dispatch process continue to behave as usual and as we saw earlier you can pick this up on the other side of the HTTP call with the IDirectCommandExecuter.

I have shifted some other code around inside the solution to support code sharing with the Azure Function but that is really the extent of the code change. We’ve changed no business logic or consuming application code – we’ve simply moved where the command runs and the calling semantics are seamless – and essentially split the store out as a micro-service running inside an Azure Function. As long as you build your sub-systems as isolated units as we have here this same approach can be used with queues and other forms of remote call.

I’ve found this approach to be massively powerful – in the early stages of a project you can make changes within a codebase and with an operational environment that is fairly simple and is easy to manage and supported by tooling and as long as you have the tests to go with it refactoring a solution like this is really simple and is supported by tools like Resharper. Then, as you begin to lock things down or the solution grows, you can pull out the sub-systems into fully independent micro services without significant code change – it’s largely just configuration as we’ve seen above.

I wrote the commanding framework I’ve been using specifically to enable this approach and you can find it, and documentation, on GitHub here.

I hope this series has been interesting and presented (or refreshed) a different way of thinking about C# application architecture. There’s a fair chance I’ll swing back round and talk a bit about commanding result caching and some other scenarios that this approach enables so watch this space.

In the meantime if you have any questions about the approach or my commanding framework please do get in touch over on Twitter.

Finally the code for this final part can be found on GitHub here:

https://github.com/JamesRandall/CommandMessagePatternTutorial/tree/master/Part5

AzureFromTheTrenches.Commanding 6.1.0 – 10x Performance Improvement

I spent some time today look at the performance of my commanding / mediator framework. Although I did a little performance work early on I’ve made a lot of changes since then and been very focused on getting the feature set and API where I want it.

As a target I wanted to get near to the performance of Mediatr – an excellent framework that describes itself as a “simple, unambitious mediator implementation”. When I began work on my framework I had flexibility as a key goal: I wanted it to support persistent event based models (event sourcing) and an evolutionary approach to architecture and development enabling the seamless movement between command handlers that run locally and remotely. There’s usually a performance price to pay for flexibility and features and so although I’d used some performance focused techniques in the code it seemed unlikely I’d be able to equal the performance of a smaller simpler framework. I decided getting within 20% the performance of Mediatr would be a reasonable price to pay for the additional functionality and flexibility.

Despite starting off in a pretty dismal place – nearly 10x slower than Mediatr – I’ve improved the performance of the framework so it is now about 10% faster than Mediatr as can be seen below (the numbers are from running large numbers of commands through both frameworks):

Commands Time Taken (ms) Per Command (ms)
AzureFromTheTrenches.Commanding 6.1.0 10000000 11695 0.0011695
Mediatr 4.0.1 10000000 12818 0.0012818
AzureFromTheTrenches.Commanding 6.0.0 10000000 127709 0.0127709

 

I’m really pleased by that but I would suggest the numbers are sufficiently close that unless you have an extreme scenario you would be better choosing between the two frameworks based on other factors – predominantly how well they address your specific domain.

For those interested in how I improved the performance of the framework I’ll be documenting my process in an upcoming post (as well as highlighting a blooper that illustrates the need to always test performance in code where it is important).

Fixing a Common IoC Container Anti-pattern – the every class is public problem

An anti-pattern I’ve seen a lot over the last few years involves the registration of dependencies in an IoC container at the root of a project (or in a dedicated “IoC” project) – an approach enabled by making every single class in every assembly in the codebase public. It’s amazing how common it is and you see it in codebases that are poor in general and codebases that are otherwise well constructed. As such I find myself talking about it frequently and so it seemed a ripe topic for a blog post.

There are numerous issues with the “every class is public” approach:

  1. As someone reading or using the code I can no longer differentiate between the public API of a subsystem and the interfaces and classes designed for internal consumption.
  2. The registering project (for example an ASP.Net application) is making decisions about the lifecycle of components in another assembly and sub-system – and therefore about the internal implementation of that sub-system. This often leads to things getting out of sync and the issues arising from this kind of lifecycle registration / implementation mismatch can be subtle.
  3. The registering project has to be aware of every single thing in the system and reference every subsystem. One of the effective techniques to police code architecture is by looking at the dependency map and this is heavily polluted if you’re doing this.
  4. The scope of a code change is often larger than it should be and spans sub-systems when it doesn’t need to – if a project takes the root registration approach then adding a class and interface for internal use means I also have to visit the root project.
  5. If sub-systems are run within multiple hosts (for example a Web API and a queue processor) then registration is either duplicated in both root projects or an “IoC configuration” project is introduced: we’ve got ourselves in such a pickle that we now need a whole project dedicated to understanding both internal and external dependencies of sub-systems.

Encapsulation is a good thing – it shouldn’t be thrown away when moving from the class to the assembly level. It’s just as important there – perhaps even more so in modern codebases which are formed of many small classes with few methods rather than large classes with many methods.

I’ve provided a simple example of this common issue in the project you can find here:

https://github.com/JamesRandall/IoCAntipatternFix/tree/master/TheProblem

Conceptually in this project we’ve got three assemblies:

  1. A console app (ConsoleApp) that depends on (2)
  2. An assembly (Calendar) providing calendar functionality to the console app that depends on (3)
  3. An assembly (Notifications) providing notification functionality to the calendar assembly

From a required dependency point of view it looks like this:

But because of the every class is public issue it is actually implemented like this:

You can see the anti-pattern manifest itself in code in the RegisterDependencies method of Program.cs in the console app:

static IServiceProvider RegisterDependencies()
{
    IServiceCollection services = new ServiceCollection();
    services.AddTransient<Calendar.DataAccess.ICalendarRepository, Calendar.DataAccess.CalendarRepository>();
    services.AddTransient<Calendar.ICalendarManager, Calendar.CalendarManager>();
    services.AddSingleton<Notifications.INotifier, Notifications.Notifier>();
    services.AddTransient<Notifications.Channel.IEmail, Notifications.Channel.Email>();

    return services.BuildServiceProvider();
}

Does the console app have any business knowing that the ICalendarRepository is implemented by the CalendarRepository class? Should it even know about the email channel? Can it safely register the INotifier implementation as a singleton? The answer to all of those questions is no. Absolutely not.

The fix for this is pretty simple and it was great to see Microsoft adopt a version of it in ASP.Net Core as part of their formalisation of dependency inversion in that framework. All you need do is encapsulate the registration logic inside your sub systems – and if you need to conditionally configure the registration then pass through an options block (an example of this can be seen in my commanding framework).

I’m going to show two versions of the fix – one based on using the containers registration interface, which has the byproduct of your assemblies becoming tied to an IoC container, and another that doesn’t require this.

Solution with a container interface

The approach adopted by Microsoft in the ASP.Net Core assemblies and the related packages is to use extension methods on the container interface (in the Microsoft case that’s IServiceCollection). If we take this approach the registration in our console app now looks like this:

static IServiceProvider RegisterDependencies()
{
    IServiceCollection services = new ServiceCollection();
    services.AddCalendar();

    return services.BuildServiceProvider();
}

Additionally our console app no longer has a reference to the notification sub-system as this is now dealt with by the calendar’s AddCalendar registration method:

public static class ServiceCollectionExtensions
{
    public static IServiceCollection AddCalendar(this IServiceCollection serviceCollection)
    {
        serviceCollection.AddTransient<ICalendarRepository, CalendarRepository>();
        serviceCollection.AddTransient<ICalendarManager, CalendarManager>();

        serviceCollection.AddNotifications();

        return serviceCollection;
    }
}

Inside the calendar project only the interfaces intended for external consumption are marked as public with the rest moving to internal. It’s no longer possible to access the assemblies private implementation from the outside and we’ve moved the lifecycle and registration logic closer to the code that is written in line with those expectations.

And finally the notification assembly takes the same approach:

public static class ServiceCollectionExtensions
{
    public static IServiceCollection AddNotifications(this IServiceCollection serviceCollection)
    {
        serviceCollection.AddSingleton<INotifier, Notifier>();
        serviceCollection.AddTransient<IEmail, Email>();
        return serviceCollection;
    }
}

With this approach we’ve addressed all three of the concerns I raised at the start of this piece and have moved back to a place where encapsulation is used to help us both read the code and use it safely.

It could well be argued that having your sub-systems reference and be aware of the specific IoC container in use is itself another anti-pattern. I’d tend towards agreeing but it can be a pragmatic choice for an internal codebase – though it’s flawed if you are creating packages for others to use: you’ve built in a hard dependency on a specific IoC container. You can solve this by defining your own interface for proxying over a container and having people implement it or use a functional approach which we’ll look at next.

The code for the above approach can be found here:

https://github.com/JamesRandall/IoCAntipatternFix/tree/master/ContainerInterfaceSolution

Solution with functions

An alternative to the interface approach is to use a functional style passing down lambda expressions. If we take this approach our console application’s registration method now looks like this:

static IServiceProvider RegisterDependencies()
{
    IServiceCollection services = new ServiceCollection();
    Dependencies.AddCalendar(
        (iface, impl) => services.AddTransient(iface, impl),
        (iface, impl) => services.AddSingleton(iface, impl));

    return services.BuildServiceProvider();
}

We simply wrap the relevant lifecycle registration methods on IServiceCollection inside lambda expressions and pass them down to a registration method in our calendar sub system:

public static class Dependencies
{
    public static void AddCalendar(
        Action<Type, Type> addTransient,
        Action<Type, Type> addSingleton)
    {
        addTransient(typeof(ICalendarRepository), typeof(CalendarRepository));
        addTransient(typeof(ICalendarManager), typeof(CalendarManager));

        Notifications.Dependencies.AddNotifications(addTransient, addSingleton);
    }
}

This registers our types using the lambda expressions and passes them on to the notification dependency:

public static class Dependencies
{
    public static void AddNotifications(
        Action<Type, Type> addTransient,
        Action<Type, Type> addSingleton)
    {
        addSingleton(typeof(INotifier), typeof(Notifier));
        addTransient(typeof(IEmail), typeof(Email));
    }
}

Again this approach addresses the concerns that arise when implementation classes are made public and registration is centralised but with the added advantage that the sub-systems are independent of any specific IoC container. In my experience this also discourages people from misusing many of the “advanced” capabilities that can be found on IoC containers – but that’s a topic for another post.

The code for this approach can be found here:

https://github.com/JamesRandall/IoCAntipatternFix/tree/master/FunctionalSolution

Wrap up

Hopefully in the above I’ve highlighted a common pitfall and demonstrated two solutions to it. There are of course many other variants you can apply depending on your specific project. If you disagree or have any questions please feel free to reach out on Twitter.

Azure Functions – Microsoft Feedback on HTTP Trigger Scaling

Following the analysis I published on Azure Functions and the latency in scaling HTTP triggered functions the Microsoft development team got in touch to discuss my findings and provide some information about the future which they were happy for me to share.

Essentially the team are already at work making improvements in this area. Understandably they were unable to commit to timescales or make specific claims as to how significant those improvements but my sense is we’re looking at a handful of months and so, hopefully, half one of this year. They are going to get in touch with me once something is available and I’ll rerun my tests.

I must admit I’m slightly sceptical as to if they’ll be able to match the scaling capability of AWS Lambda (and to be clear they did not make any such claim), which is what I’d like to see, as that looks to me as if it would require a radical uprooting of the Functions runtime model rather than an evolution but ultimately I’m just a random, slightly informed, punter. Hopefully they can at least get close enough that Azure Functions can be used in more latency critical and spiky scenarios.

I’d like to thank @jeffhollan and the team for the call – as a predominantly Azure and .NET developer it’s both helpful and encouraging to be able to have these kinds of dialogues around the platform so critical to our success.

In the interim I’m still finding I can use HTTP functions – I just have to be mindful of their current limitations – and have some upcoming blog posts on patterns that make use of them.

Azure Functions – Scaling with a Dedicated App Service Plan

After my last few posts on the scaling of Azure Functions I was intrigued to see if they would perform any better running on a dedicated App Service Plan. Hosting them in this way allows for the functions to take full advantage of App Service features but, to my mind, is no long a serverless approach as rather than being billed based on usage you are essentially renting servers and are fully responsible for scaling.

I conducted a single test scenario: an immediate load of 400 concurrent users running for 5 minutes against the “stock” JavaScript function (no external dependencies, just returns a string) on 4 configurations:

  1. Consumption Plan – billed based on usage – approximately $130 per month
    (based on running constantly at the tested throughput that is around 648 million functions per month)
  2. Dedicated App Service Plan with 1 x S1 server -$73.20 per month
  3. Dedicated App Service Plan with 2 x S1 server – $146.40 per month
  4. Dedicated App Service Plan with 4 x S1 server – $292.80 per month

I also included AWS Lambda as a reference point.

The results were certainly interesting:

With immediately available resource all 3 App Service Plan configurations begin with response times slightly ahead of the Consumption Plan but at around the 1 minute mark the Consumption Plan overtakes our single instance configuration and at 2 minutes creeps ahead of the double instance configuration and, while the advantage is slight, at 3 minutes begins to consistently outperform our 4 instance configuration. However AWS Lambda remains some way out in front.

From a throughput perspective the story is largely the same with the Consumption Plan taking time to scale up and address the demand but ultimately proving more capable than even the 4x S1 instance configuration and knocking on the door of AWS Lambda. What I did find particularly notable is the low impact of moving from 2 to 4 instances on throughput – the improvement in throughput is massively disappointing – for incurring twice the cost we are barely getting 50% more throughput. I have insufficient data to understand why this is happening but do have some tests in mind that, time allowing, I will run and see if I can provide further information.

At this kind of load (650 million requests per month) from a bang per buck point of view Azure Functions on the Consumption Plan come out strongly compared to App Service instances even if we don’t allowing for quiet periods when Functions would incur less cost. If your scale profile falls within the capabilities of the service it’s worth considering though it’s worth remembering their isn’t really an SLA around Functions at the moment when running on the Consumption Plan (and to be fair the same applies to AWS Lambda).

If you don’t want to take advantage of any of the additional features that come with a dedicated App Service plan and although they can be provisioned to avoid the slow ramp up of the Consumption Plan are expensive in comparison.

Azure Functions vs AWS Lambda vs Google Cloud Functions – JavaScript Scaling Face Off

I had a lot of interesting conversations and feedback following my recent post on scaling a serverless .NET application with Azure Functions and AWS Lambda. A common request was to also include Google Cloud Functions and a common comment was that the runtimes were not the same: .NET Core on AWS Lambda and .NET 4.6 on Azure Functions. In regard to the latter point I certainly agree this is not ideal but continue to contend that as these are your options for .NET and are fully supported and stated as scalable serverless runtimes by each vendor its worth understanding and comparing these platforms as that is your choice as a .NET developer. I’m also fairly sure that although the different runtimes might make a difference to outright raw response time, and therefore throughput and the ultimate amount of resource required, the scaling issues with Azure had less to do with the runtime and more to do with the surrounding serverless implementation.

Do I think a .NET Core function in a well architected serverless host will outperform a .NET Framework based function in a well architected serverless host? Yes. Do I think .NET Framework is the root cause of the scaling issues on Azure? No. In my view AWS Lambda currently has a superior way of managing HTTP triggered functions when compared to Azure and Azure is hampered by a model based around App Service plans.

Taking all that on board and wanting to better evidence or refute my belief that the scaling issues are more host than framework related I’ve rewritten the test subject as a tiny Node / JavaScript application and retested the platforms on this runtime – Node is supported by all three platforms and all three platforms are currently running Node JS 6.x.

My primary test continues to be a mixed light workload of CPU and IO (load three blobs from the vendors storage offering and then compile and run a handlebars template), the kind of workload its fairly typical to find in a HTTP function / public facing API. However I’ve also run some tests against “stock” functions – the vendor samples that simply return strings. Finally I’ve also included some percentile based data which I obtained using Apache Benchmark and I’ve covered off cold start scenarios.

I’ve also managed to normalise the axes this time round for a clearer comparison and the code and data can all be found on GitHub:

https://github.com/JamesRandall/serverlessJsScalingComparison

(In the last week AWS have also added full support for .NET Core 2.0 on Lambda – expect some data on that soon)

Gradual Ramp Up

This test case starts with 1 user and adds 2 users per second up to a maximum of 500 concurrent users to demonstrate a slow and steady increase in load.

The AWS and Azure results for JavaScript are very similar to those seen for .NET with Azure again struggling with response times and never really competing with AWS when under load. Both AWS and Azure exhibit faster response times when using JavaScript than .NET.

Google Cloud Functions run fairly close to AWS Lambda but can’t quite match it for response time and fall behinds on overall throughput where it sits closer to Azure’s results. Given the difference in response time this would suggest Azure is processing more concurrent incoming requests than Google allowing it to have a similar throughput after the dip Azure encounters at around the 2:30 mark – presumably Azure allocates more resource at that point. That dip deserves further attention and is something I will come back to in a future post.

Rapid Ramp Up

This test case starts with 10 users and adds 10 users every 2 seconds up to a maximum of 1000 concurrent users to demonstrate a more rapid increase in load and a higher peak concurrency.

Again AWS handles the increase in load very smoothly maintaining a low response time throughout and is the clear leader.

Azure struggles to keep up with this rate of request increase. Response times hover around the 1.5 second mark throughout the growth stage and gradually decrease towards something acceptable over the next 3 minutes. Throughput continues to climb over the full duration of the test run matching and perhaps slightly exceeding Google by the end but still some way behind Amazon.

Google has two quite distinctively sharp drops in response time early on in the growth stageas the load increases before quickly stabilising with a response time around 140ms and levels off with throughput in line with the demand at the end of the growth phase.

I didn’t run this test with .NET, instead hitting the systems with an immediate 1000 users, but nevertheless the results are inline with that test particularly once the growth phase is over.

Immediate High Demand

This test case starts immediately with 400 concurrent users and stays at that level of load for 5 minutes demonstrating the response to a sudden spike in demand.

Both AWS and Google scale quickly to deal with the sudden demand both hitting a steady and low response time around the 1 minute mark but AWS is a clear leader in throughput – it is able to get through many more requests per second than Google due to its lower response time.

Azure again brings up the rear – it takes nearly 2 minutes to reach a steady response time that is markedly higher than both Google and AWS. Throughput continues to increase to the end of the test where it eventually peaks slightly ahead of Google but still some way behind AWS. It then experiences a fall off which is difficult to explain from the data available.

Stock Functions

This test uses the stock “return a string” function provided by each platform (I’ve captured the code in GitHub for reference) with the immediate high demand scenario: 400 concurrent users for 5 minutes.

With the functions essentially doing no work and no IO the response times are, as you would expect, smaller across the board but the scaling patterns are essentially unchanged from the workload function under the same load. AWS and Google respond quickly while Azure ramps up more slowly over time.

Percentile Performance

I was unable to obtain this data from VSTS and so resorted to running Apache Benchmarker. For this test I used settings of 100 concurrent requests for a total of 10000 requests, collected the raw data, and processed it in Excel. It should be noted that the network conditions were less predictable for these tests and I wasn’t always as geographically close to the cloud function as I was in other tests though repeated runs yielded similar patterns:

AWS maintains a pretty steady response time up to and including the 98th percentile but then shows marked dips in performance in the 99th and 100th percentiles with a worst case of around 8.5 seconds.

Google dips in performance after the 97th percentile with it’s 99th percentile roughly equivalent to AWSs 100th percentile and it’s own 100th percentile being twice as slow.

Azure exhibits a significant dip in performance at the 96th percentile with a sudden drop in response time from a not great 2.5 seconds to 14.5 seconds – in AWSs 100th percentile territory. Beyond the 96th percentile their is a fairly steady decrease in performance of around 2.5 seconds per percentile.

Cold Starts

All the vendors solutions go “cold” after a time leading to a delay when they start. To get a sense for this I left each vendor idle overnight and then had 1 user make repeat requests for 1 minute to illustrate the cold start time but also get a visual sense of request rate and variance in response time:

Again we have some quite striking results. AWS has the lowest cold start time of around 1.5 seconds, Google is next at 2.5 seconds and Azure again the worst performer at 9 seconds. All three systems then settle into a fairly consistent response time but it’s striking in these graphs how AWS Lambda’s significantly better performance translates into nearly 3x as many requests as Google and 10x more requests than Azure over the minute.

It’s worth noting that the cold start time for the stock functions is almost exactly the same as for my main test case – the startup is function related and not connected to storage IO.

Conclusions

AWS Lambda is the clear leader for HTTP triggered functions – on all the runtimes I’ve tried it has the lowest response times and, at least within the volumes tested, the best ability to deal with scale and the most consistent performance. Google Cloud Functions are not far behind and it will be interesting to see if they can close the gap with optimisation work over the coming year – if they can get their flat our response times reduced they will probably pull level with AWS. The results are similar enough in their characteristics that my suspicion is Google and AWS have similar underlying approaches.

Unfortunately, like with the .NET scenarios, Azure is poor at handling HTTP triggered functions with very similar patterns on show. The Azure issues are not framework based but due to how they are hosting functions and handling scale. Hopefully over the next few months we’ll see some improvements that make Azure a more viable host for HTTP serverless / API approaches when latency matters.

By all means use the above as a rough guide but ultimately whatever platform you choose I’d encourage you to build out the smallest representative vertical slice of functionality you can and test it.

Thanks for reading – hopefully this data is useful.

C# Cloud Application Architecture – Commanding via a Mediator (Part 4)

In the last post we added validation to our solution. This time we’re going to clean up our command handlers so that they are focused more on business / domain concerns and we pull out the infrastructural concerns. After that we’ll add some telemetry to the system to further reinforce some of the benefits of the pattern. As the handlers currently stand they mix up logging with their business concerns, for example our add to cart command handler:

public async Task<CommandResponse> ExecuteAsync(AddToCartCommand command, CommandResponse previousResult)
{
    Model.ShoppingCart cart = await _repository.GetActualOrDefaultAsync(command.AuthenticatedUserId);

    StoreProduct product = (await _dispatcher.DispatchAsync(new GetStoreProductQuery{ProductId = command.ProductId})).Result;

    if (product == null)
    {
        _logger.LogWarning("Product {0} can not be added to cart for user {1} as it does not exist", command.ProductId, command.AuthenticatedUserId);
        return CommandResponse.WithError($"Product {command.ProductId} does not exist");
    }
    List<ShoppingCartItem> cartItems = new List<ShoppingCartItem>(cart.Items);
    cartItems.Add(new ShoppingCartItem
    {
        Product = product,
        Quantity = command.Quantity
    });
    cart.Items = cartItems;
    await _repository.UpdateAsync(cart);
    _logger.LogInformation("Updated basket for user {0}", command.AuthenticatedUserId);
    return CommandResponse.Ok();
}

There are a number of drawbacks with this:

  • As the author of a handler you have to remember to add all this logging code, with the best will and review cycle in the world people do forget these things or drop them due to time pressure. In fact, as a case in point, here I’ve missed the entry point logger.
  • It’s hard to tell what is exceptional and business as usual – for example the log of a warning is something specific to this business process while the top and tail is not.
  • It adds to unit test cost and complexity.
  • It’s more code and more code equals more bugs.
  • It makes the code more difficult to read.

This is all compounded further if we begin to add additional infrastructural work such as counters and further error handling:

public async Task<CommandResponse> ExecuteAsync(AddToCartCommand command, CommandResponse previousResult)
{
    Measure measure = _telemetry.Start<AddToCartCommandHandler>();
    try
    {
        _logger.LogInformation("Updating basket for user {0}", command.AuthenticatedUserId);
        Model.ShoppingCart cart = await _repository.GetActualOrDefaultAsync(command.AuthenticatedUserId);

        StoreProduct product =
            (await _dispatcher.DispatchAsync(new GetStoreProductQuery {ProductId = command.ProductId})).Result;

        if (product == null)
        {
            _logger.LogWarning("Product {0} can not be added to cart for user {1} as it does not exist",
                command.ProductId, command.AuthenticatedUserId);
            return CommandResponse.WithError($"Product {command.ProductId} does not exist");
        }
        List<ShoppingCartItem> cartItems = new List<ShoppingCartItem>(cart.Items);
        cartItems.Add(new ShoppingCartItem
        {
            Product = product,
            Quantity = command.Quantity
        });
        cart.Items = cartItems;
        await _repository.UpdateAsync(cart);
        _logger.LogInformation("Updated basket for user {0}", command.AuthenticatedUserId);
        return CommandResponse.Ok();
    }
    catch(Exception ex)
    {
        _logger.LogError("Error updating basket for user {0}", command.AuthenticatedUserId, ex);
    }
    finally
    {
        measure.Complete();
    }
}

This is getting messier and messier but, sadly, it’s not uncommon to see code like this in applications (we’ve all seen it right?) and while the traditional layered architecture can help with sorting this out (it’s common to see per service decorators for example) it tends to lead to code repitition. And if you wanted to add measures like that shown above to our system we have to visit every command handler or service. Essentially you are limited in the degree to which you can tidy this up because you’re dealing with operations expressed and executed using compiled calling semantics.

However now we’ve moved our operations over to being expressed as state and executed through a mediator we have a lot more flexibility in how we can approach this and I’d like to illustrate this through two approaches to the problem. Firstly by using the decorator pattern and then by using a feature of the AzureFromTheTrenches.Commanding framework.

Before we get into the meat of this it’s worth quickly noting that this isn’t the only approach to such a problem. One alternative, for example, is to take a strongly functional approach – something that I may explore in a future post.

Decorator Approach

Before we get started the code for this approach can be found here:

https://github.com/JamesRandall/CommandMessagePatternTutorial/tree/master/Part4-decorator

The decorator pattern allow us to add new behaviours to an existing implementation – in the system here all of our controllers make use of the ICommandDispatcher interface to execute commands and queries. For example our shopping cart controller looks like this:

[Route("api/[controller]")]
public class ShoppingCartController : AbstractCommandController
{
    public ShoppingCartController(ICommandDispatcher dispatcher) : base(dispatcher)
    {
            
    }

    [HttpGet]
    [ProducesResponseType(typeof(ShoppingCart.Model.ShoppingCart), 200)]
    public async Task<IActionResult> Get() => await ExecuteCommand<GetCartQuery, ShoppingCart.Model.ShoppingCart>();

    [HttpPut("{productId}/{quantity}")]
    public async Task<IActionResult> Put([FromRoute] AddToCartCommand command) => await ExecuteCommand(command);
        

    [HttpDelete]
    public async Task<IActionResult> Delete() => await ExecuteCommand<ClearCartCommand>();
}

If we can extend the ICommandDispatcher to handle our logging in a generic sense and replace it’s implementation in the IoC container then we no longer need to handle logging on a per command handler basis. Happily we can! First let’s take a look at the declaration of the interface:

public interface ICommandDispatcher : IFrameworkCommandDispatcher
{
        
}

Interestingly it doesn’t contain any declarations itself but it does derive from something called IFrameworkCommandDispatcher that does:

public interface IFrameworkCommandDispatcher
{
    Task<CommandResult<TResult>> DispatchAsync<TResult>(ICommand<TResult> command, CancellationToken cancellationToken = default(CancellationToken));

    Task<CommandResult> DispatchAsync(ICommand command, CancellationToken cancellationToken = default(CancellationToken));

    ICommandExecuter AssociatedExecuter { get; }
}

And so to decorate the default implementation of ICommandDispatcher with new behaviour we need to provide an implementation of this interface that adds the new functionality and calls down to the original dispatcher. For example a first cut of this might look like this:

internal class LoggingCommandDispatcher : ICommandDispatcher
{
    private readonly ICommandDispatcher _underlyingDispatcher;
    private readonly ILogger<LoggingCommandDispatcher> _logger;

    public LoggingCommandDispatcher(ICommandDispatcher underlyingDispatcher,
        ILoggerFactory loggerFactory)
    {
        _underlyingDispatcher = underlyingDispatcher;
        _logger = loggerFactory.CreateLogger<LoggingCommandDispatcher>();
    }

    public async Task<CommandResult<TResult>> DispatchAsync<TResult>(ICommand<TResult> command, CancellationToken cancellationToken)
    {
        try
        {
            _logger.LogInformation("Executing command {commandType}", command.GetType().Name);
            CommandResult<TResult> result = await _underlyingDispatcher.DispatchAsync(command, cancellationToken);
            _logger.LogInformation("Successfully executed command {commandType}", command.GetType().Name);
            return result;
        }
        catch (Exception ex)
        {
            LogFailedPostDispatchMessage(command, ex);
            return new CommandResult<TResult>(CommandResponse<TResult>.WithError($"Error occurred performing operation {command.GetType().Name}"), false);
        }
    }

    public Task<CommandResult> DispatchAsync(ICommand command, CancellationToken cancellationToken = new CancellationToken())
    {
        throw new NotSupportedException("All commands must return a CommandResponse");
    }

    public ICommandExecuter AssociatedExecuter => _underlyingDispatcher.AssociatedExecuter;
}

We’ve wrapped the underlying dispatch mechanism with loggers and a try catch block that deals with errors and by doing so we no longer have to implement this in each and every command handler. As our pattern and convention approach requires all our command to be declared with a CommandResponse result I’ve also updated the resultless DispatchAsync implementation to throw an exception – it should never be called but if it is we know we’ve missed something on a command.

Before updating all our handlers we need to replace the registration inside our IoC container so that a resolution of ICommandDispatcher will return an instance of our new decorated version. And so in our Startup.cs for the API project we need to add this line to the end of the ConfigureServices method:

services.Replace(
    new ServiceDescriptor(typeof(ICommandDispatcher), typeof(LoggingCommandDispatcher),
    ServiceLifetime.Transient));

This removes the previous implementation and replaces it with our implementation. However if we try to run this we’ll quickly run into exceptions being thrown as we’ve got a problem – and it’s a fairly common problem that you’ll run across when implementing decorators over interfaces who’s implementation is defined in a third party library. When the IoC container attempts to resolve the ICommandFactory it will correctly locate the LoggingCommandDispatcher class but its constructor expects to be passed, and here’s the problem, an instance of ICommandFactory which will cause the IoC container to attempt to instantiate another instance of LoggingCommandDispatcher and so on. At best this is going to be recognised as a problem by the IoC container and a specific excpetion will be thrown or at worst it’s going to result in a stack overflow exception.

The framework supplied implementation of ICommandDispatcher is an internal class and not something we can get access to and so the question is how do we solve it? We could declare a new interface type derived from ICommandDispatcher for our new class and update all our references but I prefer to keep the references somewhat generic – as the classes making references to ICommandDispatcher aren’t themselves asserting the capabilities they require ICommandDispatcher seems a better fit.

One option is to resolve an instance of ICommandDispatcher before we register our LoggingCommandDispatcher and then use a factory to instantiate our class – or take a similar approach but register the LoggingCommandDispatcher as a singleton. However in either case we’ve forced a lifecycle change on the ICommandDispatcher implementation (in the former example the underlying ICommandDispatcher has been made a singleton and in the latter the LoggingCommandDispatcher) which is a fairly significant change in behaviour. While, as things stand, that would work for now is likely to lead to issues and limitations later on.

We do however have an alternative approach and it’s the reason that the AzureFromTheTrenches.Commanding package defines the interface IFrameworkCommandDispatcher that we saw earlier. It’s not designed to be referenced directly by command disptching code instead its provided to support the exact scenario we have here – decorating an internal class with an IoC container without lifecycle change. The internal implementation of the command dispather is registered against both ICommandDispatcher and IFrameworkCommandDispatcher. We can take advantage of this by updating our decorated dispatcher to accept this interface as a constructor parameter:

internal class LoggingCommandDispatcher : ICommandDispatcher
{
    private readonly ICommandDispatcher _underlyingDispatcher;
    private readonly ILogger<LoggingCommandDispatcher> _logger;

    public LoggingCommandDispatcher(IFrameworkCommandDispatcher underlyingDispatcher,
        ILoggerFactory loggerFactory)
    {
        _underlyingDispatcher = underlyingDispatcher;
        _logger = loggerFactory.CreateLogger<LoggingCommandDispatcher>();
    }

    // ... rest of the code is the same
}

With that taken care of we can revisit our handlers and remove the logging code and so our add to cart command handler, which had become quite bloated, now looks like this:

public async Task<CommandResponse> ExecuteAsync(AddToCartCommand command, CommandResponse previousResult)
{
    Model.ShoppingCart cart = await _repository.GetActualOrDefaultAsync(command.AuthenticatedUserId);

    StoreProduct product = (await _dispatcher.DispatchAsync(new GetStoreProductQuery{ProductId = command.ProductId})).Result;

    if (product == null)
    {
        _logger.LogWarning("Product {0} can not be added to cart for user {1} as it does not exist", command.ProductId, command.AuthenticatedUserId);
        return CommandResponse.WithError($"Product {command.ProductId} does not exist");
    }
    List<ShoppingCartItem> cartItems = new List<ShoppingCartItem>(cart.Items);
    cartItems.Add(new ShoppingCartItem
    {
        Product = product,
        Quantity = command.Quantity
    });
    cart.Items = cartItems;
    await _repository.UpdateAsync(cart);
    return CommandResponse.Ok();
}

As another example the GetCartQueryHandler previously looked like this:

public async Task<CommandResponse<Model.ShoppingCart>> ExecuteAsync(GetCartQuery command, CommandResponse<Model.ShoppingCart> previousResult)
{
    _logger.LogInformation("Getting basket for user {0}", command.AuthenticatedUserId);
    try
    {
        Model.ShoppingCart cart = await _repository.GetActualOrDefaultAsync(command.AuthenticatedUserId);
        _logger.LogInformation("Retrieved cart for user {0} with {1} items", command.AuthenticatedUserId, cart.Items.Count);
        return CommandResponse<Model.ShoppingCart>.Ok(cart);
    }
    catch (Exception e)
    {
        _logger.LogError(e, "Unable to get basket for user {0}", command.AuthenticatedUserId);
        return CommandResponse<Model.ShoppingCart>.WithError("Unable to get basket");
    }
}

And now looks like this:

public async Task<CommandResponse<Model.ShoppingCart>> ExecuteAsync(GetCartQuery command, CommandResponse<Model.ShoppingCart> previousResult)
{
    Model.ShoppingCart cart = await _repository.GetActualOrDefaultAsync(command.AuthenticatedUserId);
    return CommandResponse<Model.ShoppingCart>.Ok(cart);
}

Much cleaner, less code, we can’t forget to do things and its much simpler to test!

Earlier I mentioned adding some basic telemetry in to track the execution time of our commands. Now we’ve got our decorated command dispatcher in place to add this to all our commands requires just a few lines of code:

public async Task<CommandResult<TResult>> DispatchAsync<TResult>(ICommand<TResult> command, CancellationToken cancellationToken)
{
    IMetricCollector metricCollector = _metricCollectorFactory.Create(command.GetType());
    try
    {   
        LogPreDispatchMessage(command);
        CommandResult<TResult> result = await _underlyingDispatcher.DispatchAsync(command, cancellationToken);
        LogSuccessfulPostDispatchMessage(command);
        metricCollector.Complete();
        return result;
    }
    catch (Exception ex)
    {
        LogFailedPostDispatchMessage(command, ex);
        metricCollector.CompleteWithError();
        return new CommandResult<TResult>(CommandResponse<TResult>.WithError($"Error occurred performing operation {command.GetType().Name}"), false);
    }
}

Again by taking this approach we don’t have to revisit each of our handlers, we don’t have to remember to do so in future, our code volume is kept low, and we don’t have to revisit all our handler unit tests.

The collector itself is just a simple wrapper for Application Insights and can be found in the full solution. Additionally in the final solution code I’ve further enhanced the LoggingCommandDispatcher to demonstrate how richer log syntax can still be output – there are a variety of ways this could be done, I’ve picked a simple to implement one here. The code for this approach can be found here:

https://github.com/JamesRandall/CommandMessagePatternTutorial/tree/master/Part4-decorator

The decorator approach is a powerful pattern that can be used to extend functionality of third party components but now we’ll look at using a built-in feature of the commanding framework we’re using to achieve the same results in a different way.

Framework Approach

The code for the below can be found at:

https://github.com/JamesRandall/CommandMessagePatternTutorial/tree/master/Part4-framework

The framework we are using contains functionality designed to support logging and event store type scenarios that we can make use of to record log entries similar to those we recorded using our decorator approach and collect our timing metrics. It does this by allowing us to attach auditor classes to three hook points:

  • Pre-dispatch – occurs as soon as a command is received by the framework
  • Post-dispatch – occurs once a command has been dispatched, this only really applies when execution happens on the other side of a network or process boundary (we’ll come back to this in the next part)
  • Post-execution – occurs once a command has been executed

For the moment we’re only going to make use of the pre-dispatch and post-execution hooks. Auditors need to implement the ICommandAuditor interface as shown in our pre-dispatch auditor below:

internal class LoggingCommandPreDispatchAuditor : ICommandAuditor
{
    private readonly ILogger<LoggingCommandPreDispatchAuditor> _logger;

    public LoggingCommandPreDispatchAuditor(ILogger<LoggingCommandPreDispatchAuditor> logger)
    {
        _logger = logger;
    }

    public Task Audit(AuditItem auditItem, CancellationToken cancellationToken)
    {
        if (auditItem.AdditionalProperties.ContainsKey("UserId"))
        {
            _logger.LogInformation("Executing command {commandType} for user {userId}",
                auditItem.CommandType,
                auditItem.AdditionalProperties["UserId"]);
        }
        else
        {
            _logger.LogInformation("Executing command {commandType}",
                auditItem.CommandType);
        }
        return Task.FromResult(0);
    }
}

The framework constructs them using the IoC container and so dependencies can be injected as appropriate.

The auditors don’t have direct access to commands but in the code above we can see that we are looking at the UserId as part of our logging logic. The framework also provides the ability for audit items to be enriched using an IAuditItemEnricher instance that does have access to the command and so we pull this information out and make it available as a property on the AuditItem using one of these enrichers:

public class AuditItemUserIdEnricher : IAuditItemEnricher
{
    public void Enrich(Dictionary<string, string> properties, ICommand command, ICommandDispatchContext context)
    {
        if (command is IUserContextCommand userContextCommand)
        {
            properties["UserId"] = userContextCommand.AuthenticatedUserId.ToString();
        }
    }
}

It’s also possible to use the same audit class for all three hook types and inspect a property of the AuditItem parameter to determine the stage but I find if that sort of branching is needed it’s often neater to have discrete classes and so I have separated out the logic for the post-execution logic into another auditor class:

internal class LoggingCommandExecutionAuditor : ICommandAuditor
{
    private readonly ILogger<LoggingCommandExecutionAuditor> _logger;
    private readonly IMetricCollector _metricCollector;

    public LoggingCommandExecutionAuditor(ILogger<LoggingCommandExecutionAuditor> logger,
        IMetricCollector metricCollector)
    {
        _logger = logger;
        _metricCollector = metricCollector;
    }

    public Task Audit(AuditItem auditItem, CancellationToken cancellationToken)
    {
        Debug.Assert(auditItem.ExecutedSuccessfully.HasValue);
        Debug.Assert(auditItem.ExecutionTimeMs.HasValue);

        if (auditItem.ExecutedSuccessfully.Value)
        {
            if (auditItem.AdditionalProperties.ContainsKey("UserId"))
            {
                _logger.LogInformation("Successfully executed command {commandType} for user {userId}",
                    auditItem.CommandType,
                    auditItem.AdditionalProperties["UserId"]);
            }
            else
            {
                _logger.LogInformation("Executing command {commandType}",
                    auditItem.CommandType);
            }
            _metricCollector.Record(auditItem.CommandType, auditItem.ExecutionTimeMs.Value);
        }
        else
        {
            if (auditItem.AdditionalProperties.ContainsKey("UserId"))
            {
                _logger.LogInformation("Error executing command {commandType} for user {userId}",
                    auditItem.CommandType,
                    auditItem.AdditionalProperties["UserId"]);
            }
            else
            {
                _logger.LogInformation("Error executing command {commandType}",
                    auditItem.CommandType);
            }
            _metricCollector.RecordWithError(auditItem.CommandType, auditItem.ExecutionTimeMs.Value);
        }
            
        return Task.FromResult(0);
    }
}

This class also collects the metrics based on the time to execute the command – the framework handily collects these for us as part of its own infrastructure.

At this point we’ve got all the components we need to perform our logging so all that is left is to register the components with the command system which is done in Startup.cs at the end of the ConfigureServices method:

CommandingDependencyResolver
    .UsePreDispatchCommandingAuditor<LoggingCommandPreDispatchAuditor>()
    .UseExecutionCommandingAuditor<LoggingCommandExecutionAuditor>()
    .UseAuditItemEnricher<AuditItemUserIdEnricher>();

Next Steps

With both approaches we’ve cleaned up our command handlers so that all they are concerned with is functional / business requirements and though I’ve presented the above as an either/or choice but it’s quite possible to use a combined approach if neither quite meets your requirements however we’re going to move forward with the framework auditor approach as it will support our next step quite nicely: we’re going to take our GetStoreProductQuery command and essentially turn it into a standalone microservice running as an Azure Function without changing any of the code that uses the command.

Azure Functions vs AWS Lambda – Scaling Face Off

Note: since first posting this I’ve published another piece that includes Google Cloud, percentile data, cold start data and uses the JavaScript runtime. It can be found here.

If you’ve been following my blog recently you’ll know I’ve been spending a lot of time with the Azure Functions – Microsoft’s implementation of a serverless platform. The idea behind serverless appeals to me massively and seems like the natural next evolution of compute on the cloud with scaling and pricing being, so the premise goes, fully dynamic and consumption based.

The use of App Service Plans (more later) as a host mechanism for Azure Functions gave me some concern about how “serverless” Azure Functions might actually be and so to verify suitability for my use cases I’ve been running a range of different tests around response time and latency that culminated in the “real” application I described in my last blog post and some of the performance tests I ran along the way. I quickly learned that the hosting implementation is not particularly dynamic and so wanted to run comparable tests on AWS Lambda.

To do this I’ve ported the serverless blog over to AWS Lambda, S3 and DynamoDB (the, rather scruffy, code is in a branch on GitHub – I will tidy this up but the aim was to get the tests running) and then I’ve run a number of user volume scenarios against a single test case: loading the homepage. The operations involved in this are:

  1. A GET request to a serverless HTTP endpoint that:
    1. Loads 3 resources from storage (Blob Storage on Azure, S3 on AWS) in an asynchronous batch.
    2. Combines them together using a Handlebars template
    3. Returns the response as a string of type text/html.

On Azure I’m using .NET 4.6 on the v1 runtime while on AWS I’m using the same code running under .NET Core 1.0. It’s worth noting that latency on blob access remained minimal throughout all these tests (6ms on average across all loads) and when removing blob access from the tests it made little difference to the patterns.

Although the .NET 4.6 and Core runtimes are different (and accepted may exhibit different behaviours) these are the current general availability options for implementing serverless on the two platforms using .NET and both vendors claim full support for them. In Microsoft’s case some of the languages supported on the v1 Azure Functions runtime, the one tested here (v2 is in preview and has serious performance issues with .NET Core), are experimental and documented as having scale problems but C# (which runs under full framework .NET) is not one of them. Both vendors have .NET Core 2.0 support on the way and in preview but given the issues I’m waiting until they go on general availability until I compare them.

The results are, frankly, pretty damning when it comes to Azure Functions ability to scale dynamically and so let’s get into the data and then look at why.

A quick note on the graphs: I’ve pulled these from VSTS, it’s quite hard (or at least I don’t know how to!) equalise the scales and so please do look at the numbers carefully – the difference is quite startling.

Add 2 Users per Second

In this test scenario I’ve started with a single user and then added 2 users per second over a 5 minutes run time up to a maximum of 500 users:

We can see from this test that AWS matches the growth in user load almost exactly, it has no issue dealing with the growing demand and page requests time hover around the 100ms mark. Contrast this with Azure which always lags a little behind the demand, is spikier, and has a much higher response time hovering around the 700ms mark.

This is backed up by the average stats from the run:

It’s interesting to note just how many more requests AWS dealt with as a result of it’s better performance: 215271 as opposed to Azure’s 84419. Well over twice as many.

Constant Load of 400 Concurrent Users

This test hits the application with 400 concurrent users from a standing start and runs over a 10 minute period simulating a sudden spike or influx of traffic and looking at how quickly each serverless environment is able to deal with the load. Neither environment was completely cold as I’d been refreshing the view in the browser but neither had had any significant traffic for some time. The contrast is significant to say the least:

Let’s cover AWS first as it’s so simple: it quickly absorbs the load and hits a steady response time of around 80ms again in under a minute.

Azure, on the other hand, is more complex. Average response time doesn’t fall under a second until the test has been running for 7 minutes and it’s only around then that the system is able to get near the throughput AWS put out in a minute. Pretty disappointing and backed up by the overall stats for the run:

Again it’s striking just how improved the AWS stats over the Azure figures.

Constant Load of 1000 Concurrent Users

Same scenario as the last test but this time 1000 users. Lets get into the data:

Again we can see a similar pattern with Azure slow to scale up to meet the demand while with AWS it is business as usual in under a minute. Interestingly at this level of concurrency AWS also error’d heavily during the early scaling:

It should be noted that AWS specifically instructs you to implement retry and backoff handlers on the client which in the load test I am not doing, additionally at this point I am seeing throttle events in the logging for the AWS function – this is something I will look to come back to in the future. However its interesting to note the contrasting approaches of the two systems: Azure inflates it’s response time while AWS prefers to throw errors.

The average stats for the run:

Azure Functions

I don’t think there’s much point dancing around the issue: the above numbers are disappointing. Azure is slow to scale it’s HTTP triggered functions and once we get beyond the 100 concurrent users point the response times are never great and the experience is generally uneven. For customer facing API / web serving where low latency and response time are critical to a smooth user experience this really rules it out as an option. And it’s not just the .NET 4.6 variant that is poor as can be seen from my previous posts where I stripped test cases down to the most basic scenarios and used a variety of frameworks. The best case for Azure scaling I’ve found is using a CSX approach to return a string but even that lags behind AWS doing real work as the test cases in this post do:

using System.Net;

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
    log.Info("C# HTTP trigger function processed a request.");

    var response = req.CreateResponse();
    response.StatusCode = HttpStatusCode.OK;
    response.Content = new StringContent("<html><head><title>Blog</title></head><body>Hello world</body></html>", System.Text.Encoding.UTF8, "text/html");

    return response;
}

With 1000 concurrent users over 5 minutes:

And with the add 2 users per second scenario:

Even in this final case, and remember this Azure Function is only returning a string, we can see the response time creeping up as the user load increases and the total number of requests served is only 77514 to AWS’s 215271 over the same period with a much lower number of requests per second.

In an additional attempt to validate my conclusion that the Azure Function system is poor at scaling I pointed the AWS Lambda installation at Azure Blob Storage instead of S3. In this test other than the function entry point semantics the code running on AWS is now taking exactly the same branches as the Azure tests and using the same underlying storage mechanism, albeit with a hop across the Internet to access the storage. I ran this scenario using the 400 concurrent user scenario:

We can see from this that other than a slightly increased response time due to the storage being hosted in another data centre AWS continues to perform well and scales up almost immediately and response time remains steady and low. We can also see their is no issue with Azure Blob Storage – if there was an issue there we’d expect to see it impact these results.

With these additional validation tests (an empty workload and AWS running against Blob Storage) that pretty much isolates the issue to the Azure Function runtime.

And it’s a shame as the developer experience is great, there is solid documentation, and plenty of samples, and the development team on Twitter are ludicrously responsive – to the point that I feel bad saying what I need to say here. I will reach out to them for feedback.

Why is this the case? Well I’d suggest the root of the issue is how the system has been built on top of App Service Plans. It’s not all that, well, serverless and you still find yourself worrying about, well, servers.

On Azure an App Service Plan is essentially a collection of rented servers / reserved compute power of a given spec (CPU, memory) and capabilities. Microsoft have layered what they call a Consumption Plan over this for Azure Functions which provides for automatic scaling and consumption based pricing. Unfortunately if you track what is going on your Functions are running on a limited number of these servers which you can evidence by tracking the instance ID and by sharing state between your functions (to be clear: this is not good!).

Essentially the level of granularity for scaling your functions remains, as in a traditional hosting model, at the server level and as your system scales up instances are slowly being added – but this is throttled tightly presumably to prevent Microsoft’s costs from spiralling out of control.

Now because they run on Application Service Plans you can switch hosting away from the Consumption plan onto a standard plan (which allows additional Azure features to be used) but this, to me, completely defeats the point of serverless. I’m paying for reserved compute again and managing server instance counts. I may as well not have bothered in the first place!

It’s hard to escape the feeling that Microsoft had to play catch up with AWS Lambda (it launched as a preview in late 2014 and went into general release in April 2015 whereas Azure Functions launched as a preview in March 2016 ) and built something they could market as serverless computing as quickly as they could by reusing existing compute and scaling systems on Azure.

Would I still use Azure Functions? Yes sure – in back end scenarios where latency isn’t all that important they’re a great fit. Anything that impacts user experience? No. Definitely not at this point.

It will be interesting to see if Microsoft revise the hosting model, I suspect if they do it’s some time off as currently they seem focused on the v2 runtime which isn’t a hosting change (as far as I can see) but rather giving Functions the ability to support more languages and .NET Core.

AWS Lambda

I’ll preface this by saying I am absolutely not an AWS expert so it’s harder for me to speculate about the underlying architecture of Lambda however… the numbers don’t lie: AWS manages to respond to changes in demand very quickly and, until I started to hit throttle limits (which I would need to speak to AWS Support to have lifted), is very consistent in response times.

I’ve not tried any state sharing but I would expect it to fail: it looks like Amazon have containerised at the Function level, rather than the host server, and this is what allows them to operate as you’d expect a serverless environment to. Both scaling and billing can then be at the function level.

Would I use AWS Lambda? Yes. But as most of my development work is on Azure I’m really hoping Microsoft bridge the capability gap.

Wrap Up and Next Steps

If you’ve followed this far – thanks! I’m a big fan of the serverless model but the Azure implementation of serverless looks like something of a compromised offering at this point and I’d be cautious of recommending it without understanding in detail the usage requirements as you will quickly hit choppy water.

I am planning on repeating similar experiments with the queue processing I began some time ago and if I get any information from Microsoft around this topic will make any corrections as appropriate. This is one of those times I’d love to have got things wrong.

Serverless Blog – Christmas 2017 Project

Happy New Year everyone – I hope everyone had a great break and has a fantastic 2018.

Much like last year I’d set some time aside over the Christmas break to tinker with something fairly left-field and somewhat experimental (algorithmic art) but unfortunately spent a lot of the break ill. This left me with a lot less time on my hands than I’d planned for and based my project around – I’d hoped to spend 4 to 5 days on it and an additional day for writing this blog post but had been left with only around 12 hours available for the implementation.

That being the case I scrabbled around for something smaller but still interesting and useful to me and that I thought would fit into the reduced amount of time I had available. I decided I’d attempt to put together a Minimum Viable Product for replacing my WordPress based blog with something that looks and feels the same to the reader but is entirely serverless in it’s architecture. My aim was to get, in no particular order, something that:

  • Renders using a similar look and feel to my current blog
  • Supports the same URL patterns for posts so that I could port my content, do a DNS change, and wouldn’t cause Google or linking sites a problem
  • Has super-cheap running costs
  • Has high uptime
  • Uses Markdown as it’s post authoring format
  • Has fast response times (< 100ms for the main payload)
  • Is capable of scaling up to high volumes of concurrent users
  • Support https for all content as my current blog does
  • Was deployed and running on an endpoint at the end of my allotted time (you can try it out here)

Knowing I only had 12 or so hours to spend on this I didn’t expect to be flicking the switch at the end of the second day and migrating my blog to this serverless system but I did want to have it running on my domain name, fairly sound, and be able to prove the points above with a working Minimum Viable Product. From a code quality point of view I wanted it to be testable and reasonably structured but I wasn’t aiming for perfection and expected low to zero automated test coverage.

The challenge here was covering enough ground in 12 hours to demonstrate an MVP worked and was in a sufficiently developed state that it was clear how the quality could be raised to a high degree with a fairly small amount of additional work.

If you’re interested in seeing the code it can be found on GitHub. If you use this as a basis for your own projects please bear in mind this was put together very rapidly in just over 12 hours – it needs more work (see next steps at the bottom of this post).

https://github.com/JamesRandall/AzureFromTheTrenches.ServerlessBlog

Planning

Normally when I undertake a project like this I’ve had the chance to roll it around in my head for a few days and can hit the ground running. With the late change of direction I didn’t really get the chance to do that and so I really came into this pretty cold.

As I wanted to replace my current WordPress blog with a serverless approach a good place to start seemed to be looking at it’s design and my workflows around it. The layout of my blog is pretty simple and every page has the same structure: a title bar, a content panel, and a sidebar:

In addition their are only really 4 types of page: a homepage made up of the most recent posts, posts, category pages, and archives. The category and archive pages simply list the posts within a category and month respectively. The only thing that causes site change is the addition or editing of a post that can cause all those pages to require update.

I do most of my writing on the train and use the markdown format which I subsequently import into WordPress for publishing. This means I don’t really use the editing capabilities of WordPress (other than to deal with markdown to WordPress conversion issues!) and so was comfortable simply uploading the Markdown to a blob container for this serverless blog. This left the question of how to get any metadata into posts (for example categories) and I decided on a simple convention based approach where an optional block of JSON could be included at the start of a post. That way that too could be maintained in a text editor.

Given all that my general approach (at this point best catgorised as a harebrained scheme) was to render the components of the site as static HTML snippets using a blob triggered Azure Function and assemble them into the overarching layout when a user visits a given page with page requests being handled by HTTP triggered Azure Functions – one per page type. I toyed with the idea of going full static and re-rendering the whole website on each update but felt this “mostly” static approach revolving around the side components might provide a bit more flexibility without much performance impact as all I’m really doing to compose a page is stitching together some strings, and if I were to actually start using this I’d like to add a couple of dynamic components.

In any case having settled on that approach I mapped the architecture out onto Azure services as shown below:

In addition to using Azure Functions as my compute platform for building out the components I picked a toolset I’m either working with day to day or have used in the past:

  • C# and .NET Core
  • Visual Studio 2017 and Visual Studio Code
  • Handlebars for page templating
  • Blob and Table storage

I briefly considered using CosmosDB as a datastore but my query needs were limited and it would bump up the running cost and add complexity for no real gain and so quickly discounted it.

Implementation

With the rough planning complete it was time to knuckle down with the laptop, a quiet room, a large quantity of coffee, and get started on some implementation. Bliss!

In order to make this readable I’ve organised my approach into a linear series of steps but like most development work there was some to-ing and fro-ing and things were iterated on and fleshed out as I moved through the process.

My general approach on a project like this is to prioritise the building out of a vertical slice and so here that meant starting with a markdown file, generating enough of the static assets that I could compose web pages, and then a couple of entry points so I could try it out in the Azure environment.

Step 1 – Replicating the Styling of my Existing Blog

As this project is really about markdown in and HTML out I wanted to start by ensuring I had a clearly defined view of that final output and so I began by creating a HTML file and CSS file that mimicked the layout of my existing blog. Design is always easier for a none-designer when you have a reference and so I quite literally opened up my current blog in one tab, my candidate HTML file in another tab and iterated over the content of it and the CSS until I had something that was a reasonable approximation.

While I’m not going to pretend that the CSS is a stunning piece of artistry this didn’t take long and I was sufficiently in the ballpark after just an hour.

Time taken: 1 hour

Step 2 – Creating a Solution and Code Skeleton

Next up was creating a solution skeleton in Visual Studio establishing the basic coding practices along with the models I expected to use throughout. My previous work with Azure Functions has been for small and quite isolatable parts of a wider system rather than being the main compute resource for the system and so I’d not really had to think too hard about how to organise the code.

Something I knew I wanted to carry over as a pattern from my previous work was the concept of “thin” functions. The function methods themselves are, to me, much like actions on a ASP.Net Core / Web API controller – entry points that accept input and return output and should be kept small and focused, handing off to more appropriate implementers that are not aware of the technicalities of the specific host technology (via services, commands etc.). Not doing that is a mix of concerns and tightly ties your implementation to the Functions runtime.

While I wanted to separate my concerns out I also didn’t want this simple solution to inflate into an overly complex system and so I settled on a fairly traditional layered approach comprised, from an implementation point of view, of 4 assemblies communicating over public C# interfaces but with fully private implementations all written to .NET Standard 2.0:

  • Models – a small set of classes to communicate basic information up and down, but not out of (by which I mean they are not persisted in a data store directly nor are they returned to the end user), the stack
  • Data Access – simple implementations on top of table storage and blob storage
  • Runtime – the handful of classes that do the actual work
  • Functions – the entry point assembly

Mapped out this ultimately gave me a solution structure like this:

The remaining decision I needed to make was how to handle dependency injection. An equivalent system written with, say, ASP.Net Core would use an IoC container and register the configuration during startup but that’s state that persists for the lifetime of the server and functions are ideally stateless. Spinning up and configuring an IoC container for each execution of a function seemed needlessly expensive so I made the decision to use a “poor mans” approach to dependency injection with the Runtime and Data Access assemblies each exposing a static factory class that was responsible for essentially implementing the “Resolve” method for each of my instantiable types and that exposed public create methods for the public interfaces of each layer.

For the limited number of classes I have this approach worked pretty well and allowed me to write testable code in the same way as if I was using a fully fledged container.

Time taken: 1 hour

Step 3 – Creating the Layout, Posts and the Homepage

The first step in turning my earlier HTML and CSS work into something that could be used to create a real blog from real posts was to write a Handlebars template for the overall layout that could stitch together the main content and sidebar into a full HTML document. Based on my earlier work this was pretty simple and looked like this:

<html>
    <head>
        <title>{{pageTitle}}</title>
        <link href="{{stylesheetUrl}}" rel="stylesheet" />
        <link rel='stylesheet' href='https://fonts.googleapis.com/css?family=Roboto:regular' type='text/css' media='all' />
        <link href="{{faviconUrl}}" rel="shortcut icon" type="image/x-icon" />
    </head>
    <body>
        <div class="title-panel">
            <div class="container">
                <a class="primary-title" href="/">{{blogName}}</a>
            </div>
        </div>
        <div class="container">
            <div class="content">            
                <div class="reading">                            
                    {{{readingContent}}}
                </div>
                <div class="sidebar">
                    {{{sidebar}}}
                </div>
            </div>
        </div>
        <div class="footer-panel">
            <div class="container">
                Copyright &copy; {{defaultAuthor}}
            </div>
        </div>
    </body>
</html>

Along with this I created a pair of methods in my composition class to bring the components of the site together:

public async Task<string> GetHomepage()
{
    return await GetWrappedContent(() => _outputRepository.GetHomepageContent());
}

private async Task<string> GetWrappedContent(Func<Task<string>> contentFunc)
{
    Task<string> templateTask = _templateRepository.GetLayoutTemplate();
    Task<string> sidebarTask = _outputRepository.GetSidebar();
    Task<string> contentTask = contentFunc();

    await Task.WhenAll(templateTask, sidebarTask, contentTask);

    string template = templateTask.Result;
    string content = contentTask.Result;
    string sidebar = sidebarTask.Result;

    TemplatePayload payload = new TemplatePayload
    {
        BlogName = _blogName,
        DefaultAuthor = _defaultAuthor,
        PageTitle = _blogName,
        ReadingContent = content,
        Sidebar = sidebar,
        StylesheetUrl = _stylesheetUrl,
        FavIconUrl = _favIconUrl
    };
    Func<object, string> compiledTemplate = Handlebars.Compile(template);

    string html = compiledTemplate(payload);
    return html;
}

To generate posts I needed to read a post from an IO stream and then convert the Markdown into un-styled HTML and for that I used the excellent CommonMark.NET which I hid behind an injected helper to facilitate later testing. After conversion the post is saved to the output blob store:

Post post = await _postRepository.Get(postStream);
string html = _markdownToHtmlConverter.FromMarkdown(post.Markdown, post.UrlName, post.Author, post.PostedAtUtc);
await _outputRepository.SavePost(post.UrlName, html);

Actually deserializing the post took a little more effort as I needed to also parse out the metadata and this can be seen in the PostParser.cs implementation.

The homepage on my blog is basically the most recent n posts compiled together and so to do this I used another Handlebars template:

{{#each this}}
    {{#if @index}}
        <div class="post-spacer"></div>
    {{/if}}
    {{{this}}}
{{/each}}

To order the posts on the homepage (and later the sidebar) I need to track the “posted at” dates of each post. I can’t use on the LastModified property of the blob as that won’t deal with updates correctly and to migrate my content over I need to be able to set the dates as part of that process. To do this I persisted some basic data to an Azure Storage table.

And finally I created a handlebars template for generating a hard coded sidebar based on my sample.

Time taken: 3 hour

Step 4 – Blob Triggered Post Processing Function

At this point I had a bunch of code written for processing markdown and generating web pages but no way to call it and so the next step was to implement a function that would listen for new and updated blobs and generate the appropriate assets:

public static class ProcessPost
{
    [FunctionName("ProcessPost")]
    public static async Task Run([BlobTrigger("posts/{name}", Connection = "BlogStorage")]Stream myBlob, string name, TraceWriter log)
    {
        log.Info($"ProcessPost triggered\n Blob Name:{name} \n Size: {myBlob.Length} Bytes");

        Factory.Create(ConfigurationOptionsFactory.Create());

        IStaticAssetManager staticAssetManager = Factory.Instance.GetRenderer();
        await staticAssetManager.AddOrUpdatePost(myBlob);
    }
}

This function demonstrates the use of some of the principles and practices I thought about during the first step of this process:

  • The Azure Function is small and restricts it’s actions to that domain: it takes an input, sets up the subsequent environment and hands off.
  • The poor mans dependency injection approach is used to resolve an instance if IStaticAssetManager.

I tested this first locally using the Azure Functions Core Tools and other than some minor fiddling around with the local tooling it just worked which I verified by checking the output blob repository and eyeballing the contents. No great genius on my part: I’m using things I’ve used before and am familiar with to solve a new problem.

Time taken: 1 hour

Step 5 – Homepage and Post Functions

Next up was to try and render my homepage and for this I wrote a new function following the same principles as before:

[FunctionName("GetHomepage")]
public static async Task<ContentResult> Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "home")]HttpRequest req, TraceWriter log)
{
    log.Info("C# getContent HTTP trigger function processing a request.");
            
    Factory.Create(ConfigurationOptionsFactory.Create());

    IResponseRenderer responseRenderer = Manager.Factory.Instance.GetResponseRenderer();
    string content = await responseRenderer.GetHomepage();

    return new ContentResult
    {
        Content = content,
        ContentType = "text/html",
        StatusCode = 200
    };            
}

This worked but I encountered my first challenge of the day: the function was on a path of https://blog.azurewebsites.net/api/home which is not going to allow it to function as the root page for my website. In fact if I went to the root I would instead see the Azure Functions welcome page:

While this is a perfectly fine page it’s not really going to help my readers view my content. Fortunately Azure Functions also include a capability called Proxies which allow you to take any incoming request, reshape it, and call an alternate backend. I had no idea if this would work on a root path but wrote the simple pass through proxy shown below:

{
  "$schema": "http://json.schemastore.org/proxies",
  "proxies": {
    "HomePageProxy": {
      "matchCondition": {
        "route": "/",
        "methods": [
          "GET"
        ]
      },
      "backendUri": "https://%BlogDomain%/home"
    }
  }
}

That matches on a GET request to the root and sends it on to my home page handler. This works absolutely fine when run on Azure but doesn’t work locally in the Core Tools – they seem to use the root path for something else. I need to do more investigation here but for now, given it works in the target environment and I only have 12 hours, I settled on this and moved on.

To remove the api component of the URI on my future functions I also modified the hosts.json file used by Azure Functions setting the HTTP routePrefix option to blank:

{
  "http": {
    "routePrefix": ""
  }
}

Writing this I’m wandering if this is what’s causing my issues with the root proxy on the local tools. Hmm. Something to try later as I can accomplish the same with another proxy.

Time taken: 2 hours

Step 6 – Load Testing

With my homepage compositor function written and a working system deployed to the cloud with this first fully representative vertical slice I wanted to get a quick handle on how it would cope with a reasonable amount of load.

Visual Studio Team Services is great for quickly throwing lots of concurrent virtual users against a public endpoint. I set up a test with a fairly rapid step up in the number of users going from 0 to 400 concurrent users in around 5 minutes and then staying at that level for another 15 minutes.

I knew from my casual browser testing that the response from the homepage function for a single user page load on a quiet system took between 60 and 100ms which I was fairly pleased about. I expected some divergence from that as the system scaled up but for things essentially to work.

Much to my surprise and horror that was not the case. As the user count increased the response time started running at around 3 to 4 seconds per request and generated an awful lot of errors along the way. The system never scaled up to a point where the load could really be acceptably dealt with as can be seen below:

I blogged about this extensively in my last post and so won’t cover it again here but the short version is that the Azure Functions v2 .NET Core runtime (that is still in preview) was the culprit. To resolve things I migrated my functions over to .NET 4.6.2 and after doing so and running a similar test again I got a much more acceptable result:

Average response time over the run averages 700ms and the system scaled out pretty nicely to deal with the additional users (and I pushed this up to 600 on this test). The anecdotal experience (me using the browser with the cache disabled as the test ran) was also excellent and felt consistently snappy throughout with timings of between 90ms and 900ms with the majority that I saw taking around 300ms (it’s worth noting I’m geographically closer than the test agents to the Azure data centre the blog is running in – VSTS doesn’t run managed agents from UK South currently).

As part of moving to .NET 4.6 I had to make some changes to my functions, an example of this is below:

[FunctionName("GetHomepage")]
public static async Task<HttpResponseMessage> Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "home")]HttpRequestMessage req, TraceWriter log)
{
    log.Info("GetHomepage triggered");
    Factory.Create(ConfigurationOptionsFactory.Create());

    IWebPageComposer webPageComposer = Factory.Instance.GetResponseRenderer();
    string content = await webPageComposer.GetHomepage();

    HttpResponseMessage response = req.CreateResponse(HttpStatusCode.OK);
    response.Content = new StringContent(content, Encoding.UTF8, "text/html");            

    return response;            
}

Time Taken: 3 hours

Step 7 – Sidebar Content

To maintain a sidebar I needed to maintain some additional metadata – what posts belong in what categories which I’m pulling from the (optional) JSON annotation of the Markdown files I outlined earlier. An example of that can be seen below:

{
    createdAtUtc: '2017-12-29 10:01:00',
    categories: [
        'C#',
        'Code'
    ],
    urlName: 'aUrlNameForAPost',
    author: 'James Randall'
}

The categories get parsed into a very simple table storage class:

internal class CategoryItem : TableEntity
{
    public string UrlName => PartitionKey;

    public string PostUrlName => RowKey;

    public string DisplayName { get; set; }

    public string PostTitle { get; set; }

    public DateTime PostedAtUtc { get; set; }

    public static string GetPartitionKey(string categoryUrlName)
    {
        return categoryUrlName;
    }

    public static string GetRowKey(string postUrlName)
    {
        return postUrlName;
    }
}

The UrlName‘s referenced above are just (by default) camelcase alphabetic strings used to identify posts as part of a URI and as such are unique (within the context of a blog). Because all this activity takes place on the backend and away from user requests I’ve not bothered with any more complex indexing strategies or further storage tables to store the unique set of categories – instead when I need to organise the categories into a hierarchical structure or get the category names I simply load them all from table store and run some simple LINQ:

internal class CategoryListBuilder : ICategoryListBuilder
{
    public IReadOnlyCollection<Category> FromCategoryItems(IEnumerable<CategoryItem> items)
    {
        var result = items.GroupBy(x => x.UrlName, (k, g) => new Category
        {
            UrlName = k,
            DisplayName = g.First().DisplayName,
            Posts = g.OrderByDescending(x => x.PostedAtUtc).Select(x => new PostSummary
            {
                PostedAtUtc = x.PostedAtUtc,
                Title = x.PostTitle,
                UrlName = x.PostUrlName
            }).ToArray()
        }).OrderBy(x => x.DisplayName).ToArray();

        return result;
    }
}

This is something that might need revisiting at some point but this isn’t some uber-content management system, it’s designed to handle simple blogs like mine, and, hey, I only have 12 hours!

I take a similar approach to generating the list of months for the archives section of the sidebar and then create it as a static asset with a Handlebars template:

<h2>Recent Posts</h2>
<ul>
    {{#each recentPosts}}
        <li><a href="/{{urlName}}">{{title}}</a></li>        
    {{/each}}    
</ul>
<h2>Archives</h2>
<ul>
    {{#each archives}}
        <li><a href="/archive/{{year}}/{{month}}">{{displayName}}</a></li>
    {{/each}}    
</ul>
<h2>Categories</h2>
<ul>
    {{#each categories}}
        <li><a href="/category/{{urlName}}">{{displayName}}</a></li>
    {{/each}}    
</ul>

Time taken: 2 hours

Step 8 – Wrap Up

With most of the system working and problems solved all that was left was to fill in a couple of the empty pages: post lists for categories and archives. The only new code I needed to this was something to summarise a post, for the moment I’ve taken a quick and dirty approach to this based on how my content is structured: I look for the title and the end of the first paragraph in the HTML output.

And having got that again I simply use another Handlebars template to generate the output and a couple more functions to return the content to a user.

Time taken: 2 hours

Conclusions and Next Steps

Did I succeed? Well I have my MVP,  it works, and it ticks off what I wanted! However I took 14 hours to put this together rather than the 12 I’d allowed. Most of the overrun was due to the performance issue with .NET Core and the Azure Functions v2 runtime, it took a little while to pin down the cause of the issue as the starting point for my investigation was based on the (generally reasonable!) assumption that I’d done something stupid.

Given that and as it’s New Year I’m going to give myself a pass and class this as a resounding success! A few takeaways for me:

  • Azure Functions are very flexible and serverless can be a great model but there are some definite limitations in the Azure Function implementation some of which stem from the underlying hosting model – I’m going to come back to this in a future blog post and, time allowing, contrast them with AWS Lamda’s.
  • Implementing this using Azure Functions was not really any harder than using ASP.Net Core or Web API.
  • Never underestimate the need to test with some load against your code.
  • It’s always spending some time on identifying the main challenges in a project and focusing your efforts against them. In this case it was covering enough ground quickly enough to validate the design without making things a nightmare to move on and into a more professional codebase.
  • If you really focus its amazing how much you can get done quickly with modern tools and technologies.
  • Development is fun! I had a great time building this small project.
  • Blogging takes even longer than development. The real overrun on this project was the blog post – I think its taken me the best part of 2 days.

If I continue with this project the next steps, in a rough priority order, will be to:

  1. Add unit tests
  2. Introduce fault tolerance strategies and logging
  3. Add a proper deployment script so others can get up and running with it
  4. Test it with more content (extracted from my blog)
  5. Improve code syntax highlighting
  6. Ensure images work

Finally the code that goes along with this blog can be found over on GitHub:

https://github.com/JamesRandall/AzureFromTheTrenches.ServerlessBlog

Azure Functions v2 Preview Performance Issues (.NET Core / Standard)

I’ve been spending a little time building out a serverless web application as a small holiday project and as this is just a side project I’d taken the opportunity to try out the new .NET Core based v2 runtime for Azure Functions and the new tooling and support in Visual Studio 2017.

As soon as I had an end to end vertical slice I wanted to run some load tests to ensure it would scale up reliably – the short version is that it didn’t. The .NET Core v2 runtime is still in preview (and you are warned not to use this environment for production workloads due to potential breaking changes) so you would hope that this will get fixed by general release but right now there seem to be some serious shortcomings in the scalability and performance of this environment rendering it fairly unusable.

I used the VSTS load testing system to hit a single URL initially with a high volume of users for a few minutes. In isolation (i.e. if I run it from a browser with no activity) this function runs in less than 100ms and normally around the 70ms mark however as the number of users increases performance quickly takes a serious nosedive with requests taking seconds to return as can be seen below:

After things settled down a little (hitting a system like this from cold with a high concurrency is going to cause some chop while things scale out) average request time began to range in the 3 to 9 seconds and the anecdotal experience (me running it in a browser / PostMan while the test was going on) gave me highly variable performance. Some requests would take just a few hundred milliseconds while others would take over 20 seconds.

Worryingly no matter how long the test was run this never improved.

I began by looking at my code assuming I’d made a silly mistake but I couldn’t see anything and so boiled things down to a really simple test case, essentially the one that is created for you by the Visual Studio template:

[FunctionName("GetString")]
public static IActionResult Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = null)]HttpRequest req, TraceWriter log)
{
    log.Info("C# HTTP trigger function processed a request.");

    var result = new OkObjectResult("hello world");

    return (IActionResult)result;
}

I expected this to scale and perform much better as it’s as simple as it gets: return a hard coded string. However to my surprise this exhibited very similar issues:

The response time, to return a string!, hovered around the 7 second mark and the system never really scaled sufficiently to deal with a small percentage of failures due to the volume.

Having run a fair few tests and racking up a lot of billable virtual user minutes on my credit card I tweaked the test slightly at this point moving to a 5 minute test length with step up concurrent user growth. Running this on the same simple test gave me, again, poor results with average response times of between 1.5 and 2 seconds for 100 concurrent users and a function that is as close to doing nothing as it gets (the response time is hidden by the page time in the performance chart below, it tracks almost exactly). The step up of users to a fairly low volume eliminates the errors, as you’d expect.

What these graphs don’t show are variance around this average response time which still ranged from a few hundred milliseconds up to around 15 seconds.

At this point I was beginning to suspect the Functions 2.0 preview runtime might be the issue and so created myself a standard Functions 1.0 runtime and deployed this simple function as a CSX script:

using System.Net;

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
    var response = req.CreateResponse();
    response.StatusCode = HttpStatusCode.OK;
    response.Content = new StringContent("hello world", System.Text.Encoding.UTF8, "text/plain");

    return response;
}

Running the same ramp up test as above shows that this function behaves much more as you’d expect with average response times in the 300ms to 400ms range when running at 100 concurrent users:

Intrigued I did run a short 5 minute 400 concurrent user test with no ramp up and again the csx based function behaved much more in line with what I think are reasonable expectations with it taking a short time to scale up to deal with the sudden demand but doing so without generating errors and eventually settling down to a response time similar to the test above:

Finally I deployed a .NET 4.6 based function into a new 1.0 runtime Function app. I made a slight mistake when setting up this test and ramped it up to 200 users rather than 100 but it scales much more as you’d expect and holds a fairly steady response time of around 150ms. Interestingly this gives longer response times than .NET Core for single requests run in isolation around 170ms for .NET 4.6 vs. 70ms for .NET Core.

At this point I felt fairly confident that the issue I was seeing in my application was due to the v2 Function runtime and so made a quick change to target .NET 4.6 instead and spun up a new v1 runtime and ran my initial 400 concurrent user test again:

As the system scales up, giving no errors, this test eventually settles at around the 500ms average request per second mark which is something I can move ahead with. I’d like to get it closer to 150ms and it will be interesting to see what I can tweak so I can on the consumption plan as I think I’m starting to bump up against some of the other limits with Functions (ironically resolving that involves taking advantage of what is actually going on with the Functions runtime implementation and accepting that its a somewhat flawed serverless implementation as it stands today).

As a more general conclusion the only real takeaway I have from the above (beyond the general point that it’s always worth doing some basic load testing even on what you assume to be simple code) is that the Azure Function 2.0 runtime has some way to go before it comes out of Preview. What’s running in Azure currently is suitable only for the most trivial of workloads – I wouldn’t feel able to run this even in a beta system today.

Something else I’d like to see from Azure Functions is a more aggressive approach to scaling up/out, for spiky workloads where low latency is important there is a significant drag factor at the moment. While you can run on an App Service Plan and handle the scaling yourself this kind of flies in the face of the core value proposition of serverless computing – I’m back to renting servers. A reserved throughput or Premium Consumption offering might make more sense.

I do plan on running these tests again once the runtime moves out of preview – I’m confident the issue will be fixed, after all to be usable as a service it basically has to be.

Recent Posts

Recent Tweets

Recent Comments

Archives

Categories

Meta

GiottoPress by Enrique Chavez