Category: AWS

Azure Functions Performance – Update on EP1 Results

March 10, 2021 by James

In yesterdays post comparing Azure Functions to AWS Lambda the EP1 plan was a notable poor performer – to the extent I wandered if it was an anomalous result. For context here is yesterday’s results for a low load test:

I created a new plan this morning with a view to exploring the results further and I think I can provide some additional insight into this.

You shouldn’t have to “warm” a Premium Plan but to be sure and consistent I ran these tests after allowing for an idle / scale down period and then making a single request for a Mandelbrot.

The data here is based around a test making 32 concurrent requests to the Function App for a single Mandelbrot. Here is the graph for the initial run.

First if we consider the overall statistics for the full run – they are still not great. If I pop those into the comparison chart I used yesterday EP1 is still trailing – the blue column is yesterdays EP1 result and the green line todays.

Its improved – but its still poor. However if we look at the graph of the run over time we can see its something of a graph of two halves and I’ve highlighted two sections of it (with the highlight numbers in the top half):

There is a marked increase in response time and request per second rate between the two halves. Although I’m not tracking the instance IDs I would conclude that Azure Functions scaled up to involve a second App Service Instance and that resulted in the improved throughput.

To verify this I immediately ran the test again to take advantage of the increased resource availability in the Function Plan and that result is shown below along with another comparative graph of the run in context.

We can see here that the EP1 plan is now in the same kind of ballpark as Lambda and the EP2 plan. As two EP1 instances in play we are now running with a similar amount of total compute as the EP1 plan – just on two 210 ACU instances rather than one 420 ACU instance.

To obtain this level of performance we are sacrificing consumption based billing and moving to a baseline cost of £0.17 per hour (£125 per month) bursting to £0.34 per hour (£250 per month) to cover this low level of load.

Conclusions

I would argue this verifies yesterdays results – with a freshly deployed Function App we have obtained similar results and by looking at its behavior over time we can see how Azure Functions is adding resource to an EP1 plan then giving us similar total resource to the EP2 plan and similar results.

Every workload is different and I would always encourage this but based on this I would strongly suggest that if you’re using Premium Plan’s you dive into your workload and seek to understand if it is a cost effective use of your spend.

Migrating www.forcyclistsbycyclists.com to AWS from Azure (Part 1)

November 22, 2020 by James

If you follow me on Twitter you might have seen that as a side project I run a cycling performance analytics website called Performance For Cyclists – this is currently running on Azure.

Its built in F# and deploys to Azure largely like this (its moved on a little since I drew this but not massively):

It runs fine but if you’ve been following me recently you’ll know I’ve been looking at AWS and am becoming somewhat concerned that Microsoft are falling behind in a couple of key areas:

Support for .NET – AWS seem to always be a step ahead in terms of .NET running in serverless environments with official support for the latest runtimes rolling out quickly and the ability to deploy custom runtimes if you need. Cold starts are much better and they have the capability to run an ASP.Net Core application serverlessly with much less fuss.

I can also, already, run .NET on ARM on AWS which leads me to my second point (its almost as if I planned this)…
Lower compute costs – my recent tests demonstrated that I could achieve a 20% to 40% saving depending on the workload by making use of ARM on AWS. It seems clear that AWS are going to broaden out ARM yet further and I can imagine them using that to put some distance between Azure and AWS pricing.

I’ve poked around this as best I can with the channels available to me but can’t get any engagement so my current assumption is Microsoft aren’t listening (to me or more broadly), know but have no response, or know but aren’t yet ready to reveal a response.

(just want to be clear about something – I don’t have an intrinsic interest in ARM, its the outcomes and any coupled economic opportunities that I am interested in)

I’m also just plain and simpe curious. I’ve dabbled with AWS, mostly when clients were using it when I freelanced, but never really gone at it with anything of significant size.

I’m going to have to figure my way through things a bit, and doubtless iterate, but at the moment I’m figuring its going to end up looking something like this:

Leaving Azure Maps their isn’t a mistake – I’m not sure what service on AWS offers the functionality I need, happy to here suggestions on Twitter!

I may go through this process and decide I’m going to stick with Azure but worst case is that I learn something! Either way I’ll blog about what I learn. I’ve already got the API up and running in ECS backed by Fargate and being built and deployed through GitHub Actions and so I’ll write about that in my next post.

Compute “Bang for Buck” on Azure and AWS – 20% to 40% advantage on AWS

November 17, 2020 by James

As I normally post from a developer perspective I thought it might be worth starting off with some additional context for this post. If you follow me on Twitter you might know that about 14 months ago I moved into a CTO role at a rapidly growing business – we’re making ever increasing use of the cloud both by migrating workloads and the introduction of new workloads. Operational cost is a significant factor in my budget. To me the cloud can be summarised as “cloud = economics + capabilities” and so if I have a similar set of capabilities (or at least capabilities that map to my needs) then reduction in compute costs has the potential to drive the choice of vendor and unlock budget I can use to grow faster.

In the last few posts I’ve been exploring the performance of ARM processors in the cloud but ultimately what matters to me is not a processor architecture but the economics it brings – how much am I paying for a given level of performance and set of characteristics.

It struck me there were some interesting differences across ARM, x86, Azure and AWS and I’ve expanded my testing and attempted here to present these findings in (hopefully) useful terms.

All tests have been run on CentOS Linux (or the AWS derivative) using the .NET 5 runtime with Apache acting as a reverse proxy to Kestrel. I’ve followed the same setup process on every VM and then run performance tests directly against their public IP using loader.io all within the USA.

I’ve run two workloads:

Generate a Mandelbrot – this is computationally heavy with no asynchronous yield points.
A test that simulates handing off asynchronously to remote resources. I’ve included a small degree of randomness in this.

At the bottom of the post is a table containing the full set of tests I’ve run on the many different VM types available. I’m going to focus on some of the more interesting scenarios here.

Computational Workload

2 Core Tests

For these tests I picked what on AWS is a mid range machine and on Azure the entry level D series machine:

AWS (ARM): t4g.large – a 2 core VM with 8GiB of RAM and costing $0.06720 per hour
AWS (x86): t3.large – a 2 core VM with 8GiB of RAM and costing $0.08320 per hour
Azure (x86): D2s v4 – a 2 core VM with 8GiB of RAM and costing $0.11100 per hour

On these machines I then ran the workloads with different numbers of clients per seconds and measured their response times and the failure rate (failure being categorised as a response of > 10 seconds):

Both Intel VMs generated too many errors at the 25 client per second rate and the load tester aborted.

Its clear from these results that the ARM VM running on AWS has a significant bang for buck advantage – its more performant than the Intel machines and is 20% cheaper than the AWS Intel machine and 40% cheaper than the Azure machine.

Interestingly the Intel machine on AWS lags behind the Intel machine on Azure particularly when stressed. It is however around 20% cheaper and it feels as if performance between the Intel machines is largely on the same economic path (the AWS machine is slightly ahead if you normalise the numbers).

4 Core Tests

I wanted to understand what a greater number of cores would do for performance – in theory it should let me scale past the 20 client per second level of the smaller instances. Having concluded that ARM represented the best value for money for this workload on AWS I didn’t do an x86 test on AWS. I used:

AWS: t4g.xlarge (ARM) – a 4 core VM with 16GiB of RAM and costing $0.13440 per hour
Azure: D4s_v4 – a 4 core VM with 16GiB of RAM and costing $0.22200 per hour

I then ran the workloads with different numbers of clients per seconds and measured their response times and the failure rate (failure being categorised as a response of > 10 seconds):

The Azure instance failed the 55 client per second rate – it had so many responses above 10 seconds in duration that the load test tool aborted the test.

Its clear from these graphs that the ARM VM running on AWS outperforms Azure both in terms of response time and massively in terms of bang for buck – its nearly half the price of the corresponding Azure VM.

Starter Workloads

One of the nice things about AWS and Azure is they offer very cheap VMs. The Azure VMs are burstable (and there is some complexity here with banked credits) which makes them hard to measure but as we saw in a previous post the ARM machines perform very well at this level.

The three machines used are:

AWS (ARM): t4g.micro, 2 core, 1GiB of RAM costing $0.00840 per hour
Azure (x86): B1S, 1 core, 1GiB of RAM costing $0.00690 per hour
AWS (x86): t3.micro, 2 core, 1 GiB of RAM costing $0.00840 per hour

Its an easy victory for ARM on AWS here – its performant, cheap and predictable. The B1S instance on Azure couldn’t handle 15 or 20 clients per second at all but may be worth consideration if its bursting system works for you.

Simulated Async Workload

2 Core Tests

For these tests I used the same configurations as in the computational workload.

Their is less to separate the processors and vendors with a less computationally intensive workload. Interestingly the AWS machines have a less stable response time with more > 10 second response times but, in the case of the ARM chip, it does this while holding a lower average response time while under load.

Its worth noting that the ARM VM is doing this at 40% of the cost of the Azure VM and so I would argue again represents the best bang for buck. The AWS x86 VM is 20% cheaper than the Azure equivelant – if you can live with the extra “chop” that may still be worth it or you can use that saving to purchase a bigger tier unit.

4 Core Tests

For these tests I used the same virtual machines as for the computational workload:

There is little to separate the two VMs until they come under heavy load at which point we see mixed results – I would argue the ARM VM suffers more as it becomes much more spiky with no consistent benefit in average response time.

However in terms of bang for buck – this ARM VM is nearly half the price of the Azure VM. There’s no contest. I could put two of these behind a load balancer for nearly the same cost.

Starter Workloads

For these tests I used the same virtual machines as for the computational workload:

Its a pretty even game here until we hit the 100 client per second range at which point the AWS VMs begin to outperform the Azure VM though at the 200 client per second range at the expense of more long response times.

Conclusions

Given the results, at least with these workloads, its hard not to conclude that AWS currently offers significantly greater bang for buck than Azure for compute. Particularly with their use of ARM processors AWS seem to have taken a big leap ahead in terms of value for money for which, at the moment, Azure doesn’t look to have any response.

Perhaps tailoring Azure VMs to your specific workloads may get you more mileage.

I’ve tried to measure raw compute here in the simplest way I can – I’d stress that if you use more managed services you may see a different story (though ultimately its all running on the same infrastructure so my suspicion is not). And as always, particularly if you’re considering a switch of vendor, I’d recommend running and measuring representative workloads.

Full Results

Test	Vendor	Instance	Clients per second	Min	Max	Average	Successful Responses	Timeouts	> 10 seconds	Price per hour
Mandelbrot	Azure	A2_V2 (x64)	2	917	927	934	60	0	0.0%	$0.10600
Mandelbrot	Azure	A2_V2 (x64)	5	1263	6649	3975	56	0	0.0%	$0.10600
Mandelbrot	Azure	A2_V2 (x64)	10	1205	10203	7985	34	21	38.2%	$0.10600
Mandelbrot	Azure	A2_V2 (x64)	15	ERROR RATE TOO HIGH				#DIV/0!	$0.10600
Mandelbrot	Azure	A2_V2 (x64)	20	ERROR RATE TOO HIGH				#DIV/0!	$0.10600
Async	Azure	A2_V2 (x64)	20	173	343	252	600	0	0.0%	$0.10600
Async	Azure	A2_V2 (x64)	50	196	504	274	1498	0	0.0%	$0.10600
Async	Azure	A2_V2 (x64)	100	239	4240	2484	1794	0	0.0%	$0.10600
Async	Azure	A2_V2 (x64)	200	423	8929	5475	1725	0	0.0%	$0.10600
Mandelbrot	Azure	B1S (x86)	2	670	2551	1171	57	0	0.0%	$0.00690
Mandelbrot	Azure	B1S (x86)	5	1612	5521	3252	72	0	0.0%	$0.00690
Mandelbrot	Azure	B1S (x86)	10	1259	10001	7115	72	2	2.7%	$0.00690
Mandelbrot	Azure	B1S (x86)	15	ERROR RATE TOO HIGH					$0.00690
Async	Azure	B1S (x64)	20	206	383	268	580	0	0.0%	$0.00690
Mandelbrot	Azure	B1S (x86)	20	ERROR RATE TOO HIGH					$0.00690
Async	Azure	B1S (x64)	50	209	436	278	1498	0	0.0%	$0.00690
Async	Azure	B1S (x64)	100	292	3151	1892	2252	0	0.0%	$0.00690
Async	Azure	B1S (x64)	200	482	7708	4474	2136	0	0.0%	$0.00690
Mandelbrot	Azure	D1 v2 (x64)	2	748	828	787	60	0	0.0%	$0.08780
Mandelbrot	Azure	D1 v2 (x64)	5	2858	4242	3646	70	0	0.0%	$0.08780
Mandelbrot	Azure	D1 v2 (x64)	10	1192	10001	7523	57	5	8.1%	$0.08780
Mandelbrot	Azure	D1 v2 (x64)	15	ERROR RATE TOO HIGH					$0.08780
Mandelbrot	Azure	D1 v2 (x64)	20	ERROR RATE TOO HIGH					$0.08780
Async	Azure	D1 v2 (x64)	20							$0.08780
Async	Azure	D1 v2 (x64)	50	168	407	244	1499	0	0.0%	$0.08780
Async	Azure	D1 v2 (x64)	100	241	3398	1986	2156	0	0.0%	$0.08780
Async	Azure	D1 v2 (x64)	200	407	9171	4927	1951	0	0.0%	$0.08780
Mandelbrot	Azure	D2as_v4	2	559	604	566	60	0	0.0%	$0.11100
Mandelbrot	Azure	D2as_v4	5	587	2606	1596	133	0	0.0%	$0.11100
Mandelbrot	Azure	D2as_v4	10	1305	5920	3541	134	0	0.0%	$0.11100
Mandelbrot	Azure	D2as_v4	15	1358	9607	5596	126	0	0.0%	$0.11100
Async	Azure	D2as_v4	20	200	305	239	600	0	0.0%	$0.11100
Mandelbrot	Azure	D2as_v4	20	638	12379	7435	104	33	24.1%	$0.11100
Mandelbrot	Azure	D2as_v4	25	1459	10293	8850	58	70	54.7%	$0.11100
Async	Azure	D2as_v4	50	200	312	238	1498	0	0.0%	$0.11100
Async	Azure	D2as_v4	100	202	347	247	3000	0	0.0%	$0.11100
Async	Azure	D2as_v4	200	295	4129	2053	4276	0	0.0%	$0.11100
Async	Azure	D2as_v4	300	329	11269	3190	4334	23	0.5%	$0.11100
Async	Azure	D2as_v4	400	338	17305	3978	4247	205	4.6%	$0.11100
Mandelbrot	AWS	t2.micro (x86)	2	675	1140	1010	58	0	0.0%	$0.01160
Mandelbrot	AWS	t2.micro (x86)	5	651	5324	3332	72	0	0.0%	$0.01160
Mandelbrot	AWS	t2.micro (x86)	10	1867	10193	6999	56	8	12.5%	$0.01160
Mandelbrot	AWS	t2.micro (x86)	15	1445	10203	9458	32	44	57.9%	$0.01160
Async	AWS	t2.micro (x64)	20	242	412	298	600	0	0.0%	$0.01160
Mandelbrot	AWS	t2.micro (x86)	20	1486	10206	8895	11	40	78.4%	$0.01160
Async	AWS	t2.micro (x64)	50	241	545	312	1497	0	0.0%	$0.01160
Async	AWS	t2.micro (x64)	100	244	9829	2260	1989	0	0.0%	$0.01160
Async	AWS	t2.micro (x64)	200	347	17375	3858	2118	252	10.6%	$0.01160
Mandelbrot	AWS	t3.micro (x86)	2	701	885	744	60	0	0.0%	$0.01040
Mandelbrot	AWS	t3.micro (x86)	5	878	3313	2069	108	0	0.0%	$0.01040
Mandelbrot	AWS	t3.micro (x86)	10	855	8037	4498	103	0	0.0%	$0.01040
Mandelbrot	AWS	t3.micro (x86)	15	973	10202	6930	84	9	9.7%	$0.01040
Async	AWS	t3.micro (x64)	20	233	402	279	600	0	0.0%	$0.01160
Mandelbrot	AWS	t3.micro (x86)	20	1030	10215	8495	74	35	32.1%	$0.01040
Async	AWS	t3.micro (x64)	50	235	4912	407	1498	0	0.0%	$0.01160
Async	AWS	t3.micro (x64)	100	235	545	292	2994	0	0.0%	$0.01160
Async	AWS	t3.micro (x64)	200	234	17376	2598	3089	260	7.8%	$0.01160
Mandelbrot	AWS	t4g.large (ARM)	2	632	779	654	60	0	0.0%	$0.06720
Mandelbrot	AWS	t4g.large (ARM)	5	698	2436	1753	137	0	0.0%	$0.06720
Mandelbrot	AWS	t4g.large (ARM)	10	1936	6284	3682	137	0	0.0%	$0.06720
Mandelbrot	AWS	t4g.large (ARM)	15	2120	9927	5624	133	0	0.0%	$0.06720
Mandelbrot	AWS	t4g.large (ARM)	20	865	10207	7472	100	31	23.7%	$0.06720
Mandelbrot	AWS	t4g.large (ARM)	25	757	10207	8432	56	80	58.8%	$0.06720
Async	AWS	t4g.large (ARM)	20	234	398	280	599	0	0.0%	$0.06720
Async	AWS	t4g.large (ARM)	50	229	395	275	1498	0	0.0%	$0.06720
Async	AWS	t4g.large (ARM)	100	236	426	287	2992	0	0.0%	$0.06720
Async	AWS	t4g.large (ARM)	200	316	17359	2080	4026	260	6.1%	$0.06720
Async	AWS	t4g.large (ARM)	300	241	17381	3322	3060	639	17.3%	$0.06720
Async	AWS	t4g.large (ARM)	400	349	13127	3346	4038	1088	21.2%	$0.06720
Mandelbrot	AWS	t4g.micro (ARM)	2	618	751	638	60	0	0.0%	$0.00840
Mandelbrot	AWS	t4g.micro (ARM)	5	765	2794	1709	132	0	0.0%	$0.00840
Mandelbrot	AWS	t4g.micro (ARM)	10	761	6958	3882	130	0	0.0%	$0.00840
Mandelbrot	AWS	t4g.micro (ARM)	15	759	10203	5704	127	1	0.8%	$0.00840
Async	AWS	t4g.micro (ARM)	20	236	371	275	600	0	0.0%	$0.00840
Mandelbrot	AWS	t4g.micro (ARM)	20	802	10207	7459	119	14	10.5%	$0.00840
Async	AWS	t4g.micro (ARM)	50	222	4178	373	1498	0	0.0%	$0.00840
Async	AWS	t4g.micro (ARM)	100	231	414	286	2994	0	0.0%	$0.00840
Async	AWS	t4g.micro (ARM)	200	310	17388	2028	3995	200	4.8%	$0.00840
Async	Azure	D4s_v4	20	167	239	200	600	0	0.0%	$0.22200
Async	Azure	D4s_v4	50	165	242	197	1499	0	0.0%	$0.22200
Async	Azure	D4s_v4	100	153	243	198	3000	0	0.0%	$0.22200
Async	Azure	D4s_v4	200	165	270	204	6000	0	0.0%	$0.22200
Async	Azure	D4s_v4	300	208	9962	1395	7900	0	0.0%	$0.22200
Async	Azure	D4s_v4	400	211	16283	2049	7695	114	1.5%	$0.22200
Mandelbrot	Azure	D4s_v4	2	304	334	313	60	0	0.0%	$0.22200
Mandelbrot	Azure	D4s_v4	5	415	675	500	150	0	0.0%	$0.22200
Mandelbrot	Azure	D4s_v4	10	488	3450	1670	2388	0	0.0%	$0.22200
Mandelbrot	Azure	D4s_v4	15	486	4371	3012	256	0	0.0%	$0.22200
Mandelbrot	Azure	D4s_v4	20	727	6572	4027	239	0	0.0%	$0.22200
Mandelbrot	Azure	D4s_v4	25	1453	8024	5127	235	0	0.0%	$0.22200
Mandelbrot	Azure	D4s_v4	30	886	9282	5988	238	0	0.0%	$0.22200
Mandelbrot	Azure	D4s_v4	35	613	10005	6850	196	19	8.8%	$0.22200
Mandelbrot	Azure	D4s_v4	40	1817	13352	7905	215	14	6.1%	$0.22200
Mandelbrot	Azure	D4s_v4	45	2412	10207	8639	204	41	16.7%	$0.22200
Mandelbrot	Azure	D4s_v4	50	747	10207	8953	80	158	66.4%	$0.22200
Mandelbrot	Azure	D4s_v4	55	ERROR RATE TOO HIGH
Mandelbrot	Azure	D2s v4	2	459	482	469	60	0	0.0%	$0.11100
Mandelbrot	Azure	D2s v4	5	883	3449	1764	123	0	0.0%	$0.11100
Mandelbrot	Azure	D2s v4	10	480	6747	4053	123	0	0.0%	$0.11100
Mandelbrot	Azure	D2s v4	15	483	10202	6286	118	1	0.8%	$0.11100
Mandelbrot	Azure	D2s v4	20	506	10206	7636	86	28	24.6%	$0.11100
Mandelbrot	Azure	D2s v4	25	ERROR RATE TOO HIGH					$0.11100
Async	Azure	D2s v4	20	168	270	206	580	0	0.0%	$0.11100
Async	Azure	D2s v4	50	164	266	205	1499	0	0.0%	$0.11100
Async	Azure	D2s v4	100	167	310	217	3000	0	0.0%	$0.11100
Async	Azure	D2s v4	200	259	3818	2233	3990	0	0.0%	$0.11100
Async	Azure	D2s v4	300	249	15603	3592	3808	12	0.3%	$0.11100
Async	Azure	D2s v4	400	330	16811	4341	3914	203	4.9%	$0.11100
Mandelbrot	AWS	t3.large (x86)	2	711	878	753	60	0	0.0%	$0.08320
Mandelbrot	AWS	t3.large (x86)	5	758	3150	2024	113	0	0.0%	$0.08320
Mandelbrot	AWS	t3.large (x86)	10	1023	7656	4393	115	0	0.0%	$0.08320
Mandelbrot	AWS	t3.large (x86)	15	2340	10202	6615	104	4	3.7%	$0.08320
Mandelbrot	AWS	t3.large (x86)	20	2453	10406	8479	47	54	53.5%	$0.08320
Mandelbrot	AWS	t3.large (x86)	25	ERROR RATE TOO HIGH					$0.08320
Async	AWS	t3.large (x86)	20	234	380	280	600	0	0.0%	$0.08320
Async	AWS	t3.large (x86)	50	230	386	276	1448	0	0.0%	$0.08320
Async	AWS	t3.large (x86)	100	235	424	287	2990	0	0.0%	$0.08320
Async	AWS	t3.large (x86)	200	237	17373	2808	3026	270	8.2%	$0.08320
Async	AWS	t3.large (x86)	300	230	17382	3358	3127	603	16.2%	$0.08320
Async	AWS	t3.large (x86)	400	551	13127	3451	3664	1088	22.9%	$0.08320
Async	AWS	t4g.xlarge (ARM)	20	231	384	274	599	0	0.0%	$0.13440
Async	AWS	t4g.xlarge (ARM)	50	221	380	273	1498	0	0.0%	$0.13440
Async	AWS	t4g.xlarge (ARM)	100	221	382	273	2995	0	0.0%	$0.13440
Async	AWS	t4g.xlarge (ARM)	200	222	395	276	6000	0	0.0%	$0.13440
Async	AWS	t4g.xlarge (ARM)	300	266	10209	706	8699	56	0.6%	$0.13440
Async	AWS	t4g.xlarge (ARM)	400	258	11106	1953	7793	1088	12.3%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	2	633	779	646	60	0	0.0%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	5	581	1115	737	150	0	0.0%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	10	613	4203	1540	264	0	0.0%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	15	1079	6666	2708	264	0	0.0%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	20	1178	6820	3723	272	0	0.0%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	25	751	8171	4425	264	0	0.0%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	30	677	10231	5555	265	6	2.2%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	35	687	10245	6356	239	25	9.5%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	40	895	10334	7353	229	22	8.8%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	45	1041	10207	8044	199	53	21.0%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	50	1236	10206	8624	173	76	30.5%	$0.13440
Mandelbrot	AWS	t4g.xlarge (ARM)	55	1193	10206	9179	105	161	60.5%	$0.13440

.NET 5 – ARM vs x64 in the Cloud Part 2 – Azure

November 16, 2020 by James

Having conducted my ARM and x64 tests on AWS yesterday I was curious to see how Azure would fair – it doesn’t support ARM but ultimately that’s a mechanism for delivering value (performance and price) and not an end in and of itself. And so this evening I set about replicating the tests on Azure.

In the end I’ve massively limited my scope to two instance sizes:

A2 – this has 2 CPUs and 4Gb of RAM (much more RAM than yesterdays) and costs $0.120 per hour
B1S – a burstable VM that has 1 CPUand 1Gb RAM (so most similar to yesterdays t2.micro) and costs $0.0124 per hour

Note – I’ve begun to conduct tests on D series too, preliminary findings is that the D1 is similar to the A2 in performance characteristics.

I was struggling to find Azure VMs with the same pricing as AWS and so had to start with a burstable VM to get something in the same kind of ballpark. Not ideal but they are the chips you are dealt on Azure! I started with the B1S which was still more expensive than the ARM VM. I created the VM, installed software, and ran the tests – the machine comes with 30 credits for bursting. However after running tests several times it was still performing consistently so these were either exhausted quickly, made little difference, or were used consistently.

I moved to the A2_V2 because, frankly, the performance was dreadful on my early tests with the B1S and I also wanted something that wouldn’t burst. I was also trying to match the spec of the AWS machines – 2 cores and 1Gb of RAM. I’ll attempt the same tests with a D series when I can.

Test setup was the same and all tests are run on VMs accessed directly on their public IP using Apache as a reverse proxy to Kestrel and our .NET application.

I’ve left the t2.micro instance out of this analysis

Mandelbrot

With 2 clients per test we see the following response times:

We can see that the two Azure instances are already off to a bad start on this computationally heavy test.

At 10 clients per second we continue to see this reflected:

However at this point the two Azure instances begin to experience timeout failures (the threshold being set at 10 seconds in the load tester):

The A2_V2 instance is faring particularly badly particularly given it is 10x the cost of the AWS instances.

Unfortunately their is no meaningful compaison I can make under higher load as both Azure instances collapse when I push to 15 clients per second. For complete sake here are the results on AWS at 20 clients per second (average response and total requests):

Simulated Async Workload

With our simulated async workload Azure fares better at low scale. Here are the results at 20 requests per second:

As we push the scale up things get interesting with different patterns across the two vendors. Here are the average response times at 200 clients per second:

At first glance AWS looks to be running away with things however both the t4g.micro and t3.micro suffer from performance degradation at the extremes – the max response time is 17 seconds for both while for the Azure instances it is around 9 seconds.

You can see this reflected in the success and total counts where the AWS instances see a number of timeout failures (> 10 seconds) while the Azure instances stay more consistent:

However the AWS instances have completed many more requests overall. I’ve not done a percentile breakdown (see comments yesterday) but it seems likely that at the edges AWS is fraying and degrading more severely than Azure leading to this pattern.

Conclusions

The different VMs clearly have different strengths and weaknesses however in the computational test the Azure results are disappointing – the VMs are more expensive yet, at best, offer performance with different characteristics (more consistent when pushed but lower average performance – pick your poison) and at worst offer much lower performance and far less value for money. They seem to struggle with computational load and nosedive rapdily when pushed in that scenario.

Full Results

Test	Vendor	Instance	Clients per second	Min	Max	Average	Successful Responses	Timeouts
Mandelbrot	AWS	t4g.micro (ARM)	2	618	751	638	60	0
Mandelbrot	AWS	t4g.micro (ARM)	5	765	2794	1709	132	0
Mandelbrot	AWS	t4g.micro (ARM)	10	761	6958	3882	130	0
Mandelbrot	AWS	t4g.micro (ARM)	15	759	10203	5704	127	1
Mandelbrot	AWS	t4g.micro (ARM)	20	802	10207	7459	119	14
Mandelbrot	AWS	t3.micro (x64)	2	701	885	744	60	0
Mandelbrot	AWS	t3.micro (x64)	5	878	3313	2069	108	0
Mandelbrot	AWS	t3.micro (x64)	10	855	8037	4498	103	0
Mandelbrot	AWS	t3.micro (x64)	15	973	10202	6930	84	9
Mandelbrot	AWS	t3.micro (x64)	20	1030	10215	8495	74	35
Mandelbrot	AWS	t2.micro (x64)	2	675	1140	1010	58	0
Mandelbrot	AWS	t2.micro (x64)	5	651	5324	3332	72	0
Mandelbrot	AWS	t2.micro (x64)	10	1867	10193	6999	56	8
Mandelbrot	AWS	t2.micro (x64)	15	1445	10203	9458	32	44
Mandelbrot	AWS	t2.micro (x64)	20	1486	10206	8895	11	40
Mandelbrot	Azure	A2_V2 (x64)	2	917	927	934	60	0
Mandelbrot	Azure	A2_V2 (x64)	5	1263	6649	3975	56	0
Mandelbrot	Azure	A2_V2 (x64)	10	1205	10203	7985	34	21
Mandelbrot	Azure	A2_V2 (x64)	15	ERROR RATE TOO HIGH
Mandelbrot	Azure	A2_V2 (x64)	20	ERROR RATE TOO HIGH
Mandelbrot	Azure	B1S (x64)	2	670	2551	1171	57	0
Mandelbrot	Azure	B1S (x64)	5	1612	5521	3252	72	0
Mandelbrot	Azure	B1S (x64)	10	1259	10001	7115	72	2
Mandelbrot	Azure	B1S (x64)	15	ERROR RATE TOO HIGH
Mandelbrot	Azure	B1S (x64)	20	ERROR RATE TOO HIGH
Async	AWS	t4g.micro (ARM)	20	236	371	275	600	0
Async	AWS	t4g.micro (ARM)	50	222	4178	373	1498	0
Async	AWS	t4g.micro (ARM)	100	231	414	286	2994	0
Async	AWS	t4g.micro (ARM)	200	310	17388	2028	3995	200
Async	AWS	t3.micro (x64)	20	233	402	279	600	0
Async	AWS	t3.micro (x64)	50	235	4912	407	1498	0
Async	AWS	t3.micro (x64)	100	235	545	292	2994	0
Async	AWS	t3.micro (x64)	200	234	17376	2598	3089	260
Async	AWS	t2.micro (x64)	20	242	412	298	600	0
Async	AWS	t2.micro (x64)	50	241	545	312	1497	0
Async	AWS	t2.micro (x64)	100	244	9829	2260	1989	0
Async	AWS	t2.micro (x64)	200	347	17375	3858	2118	252
Async	Azure	A2_V2 (x64)	20	173	343	252	600	0
Async	Azure	A2_V2 (x64)	50	196	504	274	1498	0
Async	Azure	A2_V2 (x64)	100	239	4240	2484	1794	0
Async	Azure	A2_V2 (x64)	200	423	8929	5475	1725	0
Async	Azure	B1S (x64)	20	206	383	268	580	0
Async	Azure	B1S (x64)	50	209	436	278	1498	0
Async	Azure	B1S (x64)	100	292	3151	1892	2252	0
Async	Azure	B1S (x64)	200	482	7708	4474	2136	0

.NET 5 – ARM vs x64 in the Cloud

November 15, 2020 by James

With Microsoft and Apple both now beginning to use ARM chips in laptops, what was traditionally the domain of x86/x64 architecture, I found myself curious as to the ramifications of this move – particularly by Apple who are transitioning their entire lineup to ARM over the next 2 years.

While musing on the pain points of this I found myself wandering if Azure supported ARM processors, they don’t, and got pointed to AWS who do. @thebeebs (an AWS developer advocate) mentioned that some customers had seen significant cost reductions by moving some workloads over to ARM and so I, inevitably, found myself curious as to how typical .NET workloads might run in comparison to x64 and set about some tests.

The Tests

I quickly rustled up a simple API containing two invocable workloads:

A computation heavy workload – I’m rendering a Mandelbrot and returning it as an image. This involves floating point maths.
A simulated await workload – often with APIs we hand off to some other system (e.g. a database) and then do a small amount of computation. I’ve simulated this with Task.Delay and a (very small) random factor to simulate the slight variations you will get with any network / remote service request and then around this I compute two tiny Mandelbrots and return a couple of numbers. It would be nice to come back at some point and use a more structured approach for the simulated remote latency.

I’ve written this in F# (its not particularly “functional”) using Giraffe on top of ASP.Net Core just because that’s my go to language these days. Its running under the .NET 5 runtime.

The code for this is here. Its not particularly elegant and I simply converted some old JavaScript code of mine into F# for the Mandelbrot. It does a job.

The Setup

Within AWS I created three EC2 Linux instances:

t4g.micro – ARM based, 2 vCPU, 1Gb memory, $0.0084 per hour
t3.micro – x64 based, 2 vCPU, 1Gb memory, $0.0104 per hour
t2.micro – x64 based, 1 vCPU, 1Gb memory, $0.0116 per hour

Its worth noting that my ARM instance is costing me 20% less than the t3.micro.

I’ve deliberately chosen very small instances in order to make it easier to stress them without having to sell a kidney to fund the load testing. We should be able to stress these instances quite quickly.

I then SSHed into each box and installed .NET 5 from the appropriate binaries and setup Apache as a reverse proxy. On the ARM machine I also had to install GCC and compile a version of libicui18n for .NET to work.

Next I used git clone to bring down the source and ran dotnet restore followed by dotnet run. At this point I had the same code working on each of my EC2 instances. Easy to verify as the root of the site shows a Mandelbrot:

This was all pretty easy to set up. You can also do it using a Cloud Formation sample that I was pointed at (again by @thebeebs).

I still think its worth remarking how much .NET has changed in the last few years – I’ve not touched Windows here and have the same source running on two different CPU architectures with no real effort on my part. Yes its “get through the door” stakes these days but it was hard to imagine this a few years back.

Benchmarks

My tests were fairly simple – I’ve used loader.io to maintain a steady state of a given number of clients per second and gathered up the response times and total execution counts along with the number of timeouts. I had the timeout threshold set at 10 seconds.

Time allowing I will come back to this and run some percentile analysis – loader doesn’t support this and so I would need to do some additional work.

I’ve run the test several times and averaged the results – though they were all in the same ballpark.

Mandelbrot

Firstly as a baseline lets look at things running with just two clients per second:

With little going on we can see that the ARM instance already has a slight advantage – its consistently (min, max and average) around 100ms faster than the closest x64 based instance.

Unsurprisingly if we push things a little harder to 5 clients per second this becomes magnified:

We’re getting no errors or timeouts at this point and you can see the total throughput over the 30 second run below:

The ARM instance has completed around 20% more requests than the nearest x64 instance, with a 18% improvement in average response time and at 80% of the cost.

And if we push this out to 20 clients per second (my largest scale test) the ARM instance looks better again:

Its worth noting that at this point all three instances are generating timeouts in our load test suite but again the ARM instance wins out here – we get fewer timeouts and get through more overall requests:

You can see from this that our ARM instance is performing much better under this level of load. We can say that:

Its successfully completed 60% more requests than the nearest x64 instance
It has a roughly 12% improvement on average response time
And it is doing this at 80% of the cost of the x64 instance

With our Mandelbrot test its clear that the ARM instance has a consistent advantage both in performance and cost.

Simulated Async Workload

Starting again with a low scale test (in this case 50 clients per second – this test spends significant time awaiting) in this case we can see that our t2 x64 instance had an advantage of around 40ms:

However if we move up to 100 clients per second we can see the t2 instance essentially collapse while out t4g ARM instance and t3 x64 instance are essentially level pegging (286ms and 292ms) respectively:

We get no timeouts at this point and our ARM and x64 instance level peg again on total requests:

However if we push on to a higher scale test (200 clients per second) we can see the ARM instance begin to pull ahead:

Conclusions

Going into this I really didn’t know what to expect but these fairly simple tests suggest their is an economic advantage to running under ARM in the cloud. At worst you will see comparable performance at a lower price point but for some workloads you may see a significant performance gain – again at a lower price point.

20% performance gain at 80% the price is most certainly not to be sniffed at and for large workloads could quickly offset the cost of moving infrastructure to ARM.

Presumably the price savings are due to the power efficiency of the ARM chips. However what is hard to tell is how much of the pricing is “early adopter” to encourage people to move to CPUs that have long term advantage to cloud vendors (even minor power efficiency gains over cloud scale data centers must total significant numbers on the bottom line) and how much of that will be sustained and passed on to users in the long term.

Doubtless we’ll land somewhere in the middle.

Question I have now is: where the heck is Azure in all this? Between Lambda and ARM on AWS its hard not to feel as if the portability advantages, both processor and OS, of .NET Core / 5 are being realised more effectively by Amazon than they are by Microsoft themselves. Strange times.

Full Results

			Response Times (ms)
Test	Instance	Clients per second	Min	Max	Average	Successful Responses	Timeouts
Mandelbrot	t4g.micro (ARM)	2	618	751	638	60	0
Mandelbrot	t4g.micro (ARM)	5	765	2794	1709	132	0
Mandelbrot	t4g.micro (ARM)	10	761	6958	3882	130	0
Mandelbrot	t4g.micro (ARM)	15	759	10203	5704	127	1
Mandelbrot	t4g.micro (ARM)	20	802	10207	7459	119	14
Mandelbrot	t3.micro (x64)	2	701	885	744	60	0
Mandelbrot	t3.micro (x64)	5	878	3313	2069	108	0
Mandelbrot	t3.micro (x64)	10	855	8037	4498	103	0
Mandelbrot	t3.micro (x64)	15	973	10202	6930	84	9
Mandelbrot	t3.micro (x64)	20	1030	10215	8495	74	35
Mandelbrot	t2.micro (x64)	2	675	1140	1010	58	0
Mandelbrot	t2.micro (x64)	5	651	5324	3332	72	0
Mandelbrot	t2.micro (x64)	10	1867	10193	6999	56	8
Mandelbrot	t2.micro (x64)	15	1445	10203	9458	32	44
Mandelbrot	t2.micro (x64)	20	1486	10206	8895	11	40
Async	t4g.micro (ARM)	20	236	371	275	600	0
Async	t4g.micro (ARM)	50	222	4178	373	1498	0
Async	t4g.micro (ARM)	100	231	414	286	2994	0
Async	t4g.micro (ARM)	200	310	17388	2028	3995	200
Async	t3.micro (x64)	20	233	402	279	600	0
Async	t3.micro (x64)	50	235	4912	407	1498	0
Async	t3.micro (x64)	100	235	545	292	2994	0
Async	t3.micro (x64)	200	234	17376	2598	3089	260
Async	t2.micro (x64)	20	242	412	298	600	0
Async	t2.micro (x64)	50	241	545	312	1497	0
Async	t2.micro (x64)	100	244	9829	2260	1989	0
Async	t2.micro (x64)	200	347	17375	3858	2118	252

Azure Functions – Scaling with a Dedicated App Service Plan

January 29, 2018 by James

Since I published this piece Microsoft have made significant improvements to HTTP scaling on Azure Functions. I’ve not yet had the opportunity to test performance on dedicated app service plans but please see this post for a revised comparison on the Consumption Plan.

After my last few posts on the scaling of Azure Functions I was intrigued to see if they would perform any better running on a dedicated App Service Plan. Hosting them in this way allows for the functions to take full advantage of App Service features but, to my mind, is no long a serverless approach as rather than being billed based on usage you are essentially renting servers and are fully responsible for scaling.

I conducted a single test scenario: an immediate load of 400 concurrent users running for 5 minutes against the “stock” JavaScript function (no external dependencies, just returns a string) on 4 configurations:

Consumption Plan – billed based on usage – approximately $130 per month
(based on running constantly at the tested throughput that is around 648 million functions per month)
Dedicated App Service Plan with 1 x S1 server -$73.20 per month
Dedicated App Service Plan with 2 x S1 server – $146.40 per month
Dedicated App Service Plan with 4 x S1 server – $292.80 per month

I also included AWS Lambda as a reference point.

The results were certainly interesting:

With immediately available resource all 3 App Service Plan configurations begin with response times slightly ahead of the Consumption Plan but at around the 1 minute mark the Consumption Plan overtakes our single instance configuration and at 2 minutes creeps ahead of the double instance configuration and, while the advantage is slight, at 3 minutes begins to consistently outperform our 4 instance configuration. However AWS Lambda remains some way out in front.

From a throughput perspective the story is largely the same with the Consumption Plan taking time to scale up and address the demand but ultimately proving more capable than even the 4x S1 instance configuration and knocking on the door of AWS Lambda. What I did find particularly notable is the low impact of moving from 2 to 4 instances on throughput – the improvement in throughput is massively disappointing – for incurring twice the cost we are barely getting 50% more throughput. I have insufficient data to understand why this is happening but do have some tests in mind that, time allowing, I will run and see if I can provide further information.

At this kind of load (650 million requests per month) from a bang per buck point of view Azure Functions on the Consumption Plan come out strongly compared to App Service instances even if we don’t allowing for quiet periods when Functions would incur less cost. If your scale profile falls within the capabilities of the service it’s worth considering though it’s worth remembering their isn’t really an SLA around Functions at the moment when running on the Consumption Plan (and to be fair the same applies to AWS Lambda).

If you don’t want to take advantage of any of the additional features that come with a dedicated App Service plan and although they can be provisioned to avoid the slow ramp up of the Consumption Plan are expensive in comparison.

Azure Functions vs AWS Lambda vs Google Cloud Functions – JavaScript Scaling Face Off

January 20, 2018 by James

Since I published this piece Microsoft have made significant improvements to HTTP scaling on Azure Functions and the below is out of date. Please see this post for a revised comparison.

I had a lot of interesting conversations and feedback following my recent post on scaling a serverless .NET application with Azure Functions and AWS Lambda. A common request was to also include Google Cloud Functions and a common comment was that the runtimes were not the same: .NET Core on AWS Lambda and .NET 4.6 on Azure Functions. In regard to the latter point I certainly agree this is not ideal but continue to contend that as these are your options for .NET and are fully supported and stated as scalable serverless runtimes by each vendor its worth understanding and comparing these platforms as that is your choice as a .NET developer. I’m also fairly sure that although the different runtimes might make a difference to outright raw response time, and therefore throughput and the ultimate amount of resource required, the scaling issues with Azure had less to do with the runtime and more to do with the surrounding serverless implementation.

Do I think a .NET Core function in a well architected serverless host will outperform a .NET Framework based function in a well architected serverless host? Yes. Do I think .NET Framework is the root cause of the scaling issues on Azure? No. In my view AWS Lambda currently has a superior way of managing HTTP triggered functions when compared to Azure and Azure is hampered by a model based around App Service plans.

Taking all that on board and wanting to better evidence or refute my belief that the scaling issues are more host than framework related I’ve rewritten the test subject as a tiny Node / JavaScript application and retested the platforms on this runtime – Node is supported by all three platforms and all three platforms are currently running Node JS 6.x.

My primary test continues to be a mixed light workload of CPU and IO (load three blobs from the vendors storage offering and then compile and run a handlebars template), the kind of workload its fairly typical to find in a HTTP function / public facing API. However I’ve also run some tests against “stock” functions – the vendor samples that simply return strings. Finally I’ve also included some percentile based data which I obtained using Apache Benchmark and I’ve covered off cold start scenarios.

I’ve also managed to normalise the axes this time round for a clearer comparison and the code and data can all be found on GitHub:

https://github.com/JamesRandall/serverlessJsScalingComparison

(In the last week AWS have also added full support for .NET Core 2.0 on Lambda – expect some data on that soon)

Gradual Ramp Up

This test case starts with 1 user and adds 2 users per second up to a maximum of 500 concurrent users to demonstrate a slow and steady increase in load.

The AWS and Azure results for JavaScript are very similar to those seen for .NET with Azure again struggling with response times and never really competing with AWS when under load. Both AWS and Azure exhibit faster response times when using JavaScript than .NET.

Google Cloud Functions run fairly close to AWS Lambda but can’t quite match it for response time and fall behinds on overall throughput where it sits closer to Azure’s results. Given the difference in response time this would suggest Azure is processing more concurrent incoming requests than Google allowing it to have a similar throughput after the dip Azure encounters at around the 2:30 mark – presumably Azure allocates more resource at that point. That dip deserves further attention and is something I will come back to in a future post.

Rapid Ramp Up

This test case starts with 10 users and adds 10 users every 2 seconds up to a maximum of 1000 concurrent users to demonstrate a more rapid increase in load and a higher peak concurrency.

Again AWS handles the increase in load very smoothly maintaining a low response time throughout and is the clear leader.

Azure struggles to keep up with this rate of request increase. Response times hover around the 1.5 second mark throughout the growth stage and gradually decrease towards something acceptable over the next 3 minutes. Throughput continues to climb over the full duration of the test run matching and perhaps slightly exceeding Google by the end but still some way behind Amazon.

Google has two quite distinctively sharp drops in response time early on in the growth stageas the load increases before quickly stabilising with a response time around 140ms and levels off with throughput in line with the demand at the end of the growth phase.

I didn’t run this test with .NET, instead hitting the systems with an immediate 1000 users, but nevertheless the results are inline with that test particularly once the growth phase is over.

Immediate High Demand

This test case starts immediately with 400 concurrent users and stays at that level of load for 5 minutes demonstrating the response to a sudden spike in demand.

Both AWS and Google scale quickly to deal with the sudden demand both hitting a steady and low response time around the 1 minute mark but AWS is a clear leader in throughput – it is able to get through many more requests per second than Google due to its lower response time.

Azure again brings up the rear – it takes nearly 2 minutes to reach a steady response time that is markedly higher than both Google and AWS. Throughput continues to increase to the end of the test where it eventually peaks slightly ahead of Google but still some way behind AWS. It then experiences a fall off which is difficult to explain from the data available.

Stock Functions

This test uses the stock “return a string” function provided by each platform (I’ve captured the code in GitHub for reference) with the immediate high demand scenario: 400 concurrent users for 5 minutes.

With the functions essentially doing no work and no IO the response times are, as you would expect, smaller across the board but the scaling patterns are essentially unchanged from the workload function under the same load. AWS and Google respond quickly while Azure ramps up more slowly over time.

Percentile Performance

I was unable to obtain this data from VSTS and so resorted to running Apache Benchmarker. For this test I used settings of 100 concurrent requests for a total of 10000 requests, collected the raw data, and processed it in Excel. It should be noted that the network conditions were less predictable for these tests and I wasn’t always as geographically close to the cloud function as I was in other tests though repeated runs yielded similar patterns:

AWS maintains a pretty steady response time up to and including the 98th percentile but then shows marked dips in performance in the 99th and 100th percentiles with a worst case of around 8.5 seconds.

Google dips in performance after the 97th percentile with it’s 99th percentile roughly equivalent to AWSs 100th percentile and it’s own 100th percentile being twice as slow.

Azure exhibits a significant dip in performance at the 96th percentile with a sudden drop in response time from a not great 2.5 seconds to 14.5 seconds – in AWSs 100th percentile territory. Beyond the 96th percentile their is a fairly steady decrease in performance of around 2.5 seconds per percentile.

Cold Starts

All the vendors solutions go “cold” after a time leading to a delay when they start. To get a sense for this I left each vendor idle overnight and then had 1 user make repeat requests for 1 minute to illustrate the cold start time but also get a visual sense of request rate and variance in response time:

Again we have some quite striking results. AWS has the lowest cold start time of around 1.5 seconds, Google is next at 2.5 seconds and Azure again the worst performer at 9 seconds. All three systems then settle into a fairly consistent response time but it’s striking in these graphs how AWS Lambda’s significantly better performance translates into nearly 3x as many requests as Google and 10x more requests than Azure over the minute.

It’s worth noting that the cold start time for the stock functions is almost exactly the same as for my main test case – the startup is function related and not connected to storage IO.

Conclusions

AWS Lambda is the clear leader for HTTP triggered functions – on all the runtimes I’ve tried it has the lowest response times and, at least within the volumes tested, the best ability to deal with scale and the most consistent performance. Google Cloud Functions are not far behind and it will be interesting to see if they can close the gap with optimisation work over the coming year – if they can get their flat our response times reduced they will probably pull level with AWS. The results are similar enough in their characteristics that my suspicion is Google and AWS have similar underlying approaches.

Unfortunately, like with the .NET scenarios, Azure is poor at handling HTTP triggered functions with very similar patterns on show. The Azure issues are not framework based but due to how they are hosting functions and handling scale. Hopefully over the next few months we’ll see some improvements that make Azure a more viable host for HTTP serverless / API approaches when latency matters.

By all means use the above as a rough guide but ultimately whatever platform you choose I’d encourage you to build out the smallest representative vertical slice of functionality you can and test it.

Thanks for reading – hopefully this data is useful.

Azure Functions vs AWS Lambda – Scaling Face Off

January 6, 2018 by James

Since I published this piece Microsoft have made significant improvements to HTTP scaling on Azure Functions and the below is out of date. Please see this post for a revised comparison.

If you’ve been following my blog recently you’ll know I’ve been spending a lot of time with the Azure Functions – Microsoft’s implementation of a serverless platform. The idea behind serverless appeals to me massively and seems like the natural next evolution of compute on the cloud with scaling and pricing being, so the premise goes, fully dynamic and consumption based.

The use of App Service Plans (more later) as a host mechanism for Azure Functions gave me some concern about how “serverless” Azure Functions might actually be and so to verify suitability for my use cases I’ve been running a range of different tests around response time and latency that culminated in the “real” application I described in my last blog post and some of the performance tests I ran along the way. I quickly learned that the hosting implementation is not particularly dynamic and so wanted to run comparable tests on AWS Lambda.

To do this I’ve ported the serverless blog over to AWS Lambda, S3 and DynamoDB (the, rather scruffy, code is in a branch on GitHub – I will tidy this up but the aim was to get the tests running) and then I’ve run a number of user volume scenarios against a single test case: loading the homepage. The operations involved in this are:

A GET request to a serverless HTTP endpoint that:
1. Loads 3 resources from storage (Blob Storage on Azure, S3 on AWS) in an asynchronous batch.
2. Combines them together using a Handlebars template
3. Returns the response as a string of type text/html.

On Azure I’m using .NET 4.6 on the v1 runtime while on AWS I’m using the same code running under .NET Core 1.0. It’s worth noting that latency on blob access remained minimal throughout all these tests (6ms on average across all loads) and when removing blob access from the tests it made little difference to the patterns.

Although the .NET 4.6 and Core runtimes are different (and accepted may exhibit different behaviours) these are the current general availability options for implementing serverless on the two platforms using .NET and both vendors claim full support for them. In Microsoft’s case some of the languages supported on the v1 Azure Functions runtime, the one tested here (v2 is in preview and has serious performance issues with .NET Core), are experimental and documented as having scale problems but C# (which runs under full framework .NET) is not one of them. Both vendors have .NET Core 2.0 support on the way and in preview but given the issues I’m waiting until they go on general availability until I compare them.

The results are, frankly, pretty damning when it comes to Azure Functions ability to scale dynamically and so let’s get into the data and then look at why.

A quick note on the graphs: I’ve pulled these from VSTS, it’s quite hard (or at least I don’t know how to!) equalise the scales and so please do look at the numbers carefully – the difference is quite startling.

Add 2 Users per Second

In this test scenario I’ve started with a single user and then added 2 users per second over a 5 minutes run time up to a maximum of 500 users:

We can see from this test that AWS matches the growth in user load almost exactly, it has no issue dealing with the growing demand and page requests time hover around the 100ms mark. Contrast this with Azure which always lags a little behind the demand, is spikier, and has a much higher response time hovering around the 700ms mark.

This is backed up by the average stats from the run:

It’s interesting to note just how many more requests AWS dealt with as a result of it’s better performance: 215271 as opposed to Azure’s 84419. Well over twice as many.

Constant Load of 400 Concurrent Users

This test hits the application with 400 concurrent users from a standing start and runs over a 10 minute period simulating a sudden spike or influx of traffic and looking at how quickly each serverless environment is able to deal with the load. Neither environment was completely cold as I’d been refreshing the view in the browser but neither had had any significant traffic for some time. The contrast is significant to say the least:

Let’s cover AWS first as it’s so simple: it quickly absorbs the load and hits a steady response time of around 80ms again in under a minute.

Azure, on the other hand, is more complex. Average response time doesn’t fall under a second until the test has been running for 7 minutes and it’s only around then that the system is able to get near the throughput AWS put out in a minute. Pretty disappointing and backed up by the overall stats for the run:

Again it’s striking just how improved the AWS stats over the Azure figures.

Constant Load of 1000 Concurrent Users

Same scenario as the last test but this time 1000 users. Lets get into the data:

Again we can see a similar pattern with Azure slow to scale up to meet the demand while with AWS it is business as usual in under a minute. Interestingly at this level of concurrency AWS also error’d heavily during the early scaling:

It should be noted that AWS specifically instructs you to implement retry and backoff handlers on the client which in the load test I am not doing, additionally at this point I am seeing throttle events in the logging for the AWS function – this is something I will look to come back to in the future. However its interesting to note the contrasting approaches of the two systems: Azure inflates it’s response time while AWS prefers to throw errors.

The average stats for the run:

Azure Functions

I don’t think there’s much point dancing around the issue: the above numbers are disappointing. Azure is slow to scale it’s HTTP triggered functions and once we get beyond the 100 concurrent users point the response times are never great and the experience is generally uneven. For customer facing API / web serving where low latency and response time are critical to a smooth user experience this really rules it out as an option. And it’s not just the .NET 4.6 variant that is poor as can be seen from my previous posts where I stripped test cases down to the most basic scenarios and used a variety of frameworks. The best case for Azure scaling I’ve found is using a CSX approach to return a string but even that lags behind AWS doing real work as the test cases in this post do:

using System.Net;

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
    log.Info("C# HTTP trigger function processed a request.");

    var response = req.CreateResponse();
    response.StatusCode = HttpStatusCode.OK;
    response.Content = new StringContent("<html><head><title>Blog</title></head><body>Hello world</body></html>", System.Text.Encoding.UTF8, "text/html");

    return response;
}

With 1000 concurrent users over 5 minutes:

And with the add 2 users per second scenario:

Even in this final case, and remember this Azure Function is only returning a string, we can see the response time creeping up as the user load increases and the total number of requests served is only 77514 to AWS’s 215271 over the same period with a much lower number of requests per second.

In an additional attempt to validate my conclusion that the Azure Function system is poor at scaling I pointed the AWS Lambda installation at Azure Blob Storage instead of S3. In this test other than the function entry point semantics the code running on AWS is now taking exactly the same branches as the Azure tests and using the same underlying storage mechanism, albeit with a hop across the Internet to access the storage. I ran this scenario using the 400 concurrent user scenario:

We can see from this that other than a slightly increased response time due to the storage being hosted in another data centre AWS continues to perform well and scales up almost immediately and response time remains steady and low. We can also see their is no issue with Azure Blob Storage – if there was an issue there we’d expect to see it impact these results.

With these additional validation tests (an empty workload and AWS running against Blob Storage) that pretty much isolates the issue to the Azure Function runtime.

And it’s a shame as the developer experience is great, there is solid documentation, and plenty of samples, and the development team on Twitter are ludicrously responsive – to the point that I feel bad saying what I need to say here. I will reach out to them for feedback.

Why is this the case? Well I’d suggest the root of the issue is how the system has been built on top of App Service Plans. It’s not all that, well, serverless and you still find yourself worrying about, well, servers.

On Azure an App Service Plan is essentially a collection of rented servers / reserved compute power of a given spec (CPU, memory) and capabilities. Microsoft have layered what they call a Consumption Plan over this for Azure Functions which provides for automatic scaling and consumption based pricing. Unfortunately if you track what is going on your Functions are running on a limited number of these servers which you can evidence by tracking the instance ID and by sharing state between your functions (to be clear: this is not good!).

Essentially the level of granularity for scaling your functions remains, as in a traditional hosting model, at the server level and as your system scales up instances are slowly being added – but this is throttled tightly presumably to prevent Microsoft’s costs from spiralling out of control.

Now because they run on Application Service Plans you can switch hosting away from the Consumption plan onto a standard plan (which allows additional Azure features to be used) but this, to me, completely defeats the point of serverless. I’m paying for reserved compute again and managing server instance counts. I may as well not have bothered in the first place!

It’s hard to escape the feeling that Microsoft had to play catch up with AWS Lambda (it launched as a preview in late 2014 and went into general release in April 2015 whereas Azure Functions launched as a preview in March 2016 ) and built something they could market as serverless computing as quickly as they could by reusing existing compute and scaling systems on Azure.

Would I still use Azure Functions? Yes sure – in back end scenarios where latency isn’t all that important they’re a great fit. Anything that impacts user experience? No. Definitely not at this point.

It will be interesting to see if Microsoft revise the hosting model, I suspect if they do it’s some time off as currently they seem focused on the v2 runtime which isn’t a hosting change (as far as I can see) but rather giving Functions the ability to support more languages and .NET Core.

AWS Lambda

I’ll preface this by saying I am absolutely not an AWS expert so it’s harder for me to speculate about the underlying architecture of Lambda however… the numbers don’t lie: AWS manages to respond to changes in demand very quickly and, until I started to hit throttle limits (which I would need to speak to AWS Support to have lifted), is very consistent in response times.

I’ve not tried any state sharing but I would expect it to fail: it looks like Amazon have containerised at the Function level, rather than the host server, and this is what allows them to operate as you’d expect a serverless environment to. Both scaling and billing can then be at the function level.

Would I use AWS Lambda? Yes. But as most of my development work is on Azure I’m really hoping Microsoft bridge the capability gap.

Wrap Up and Next Steps

If you’ve followed this far – thanks! I’m a big fan of the serverless model but the Azure implementation of serverless looks like something of a compromised offering at this point and I’d be cautious of recommending it without understanding in detail the usage requirements as you will quickly hit choppy water.

I am planning on repeating similar experiments with the queue processing I began some time ago and if I get any information from Microsoft around this topic will make any corrections as appropriate. This is one of those times I’d love to have got things wrong.

Category: AWS

Conclusions

Computational Workload

2 Core Tests

4 Core Tests

Starter Workloads

Simulated Async Workload

2 Core Tests

4 Core Tests

Starter Workloads

Conclusions

Full Results

Mandelbrot

Simulated Async Workload

Conclusions

Full Results

The Tests

The Setup

Benchmarks

Mandelbrot

Simulated Async Workload

Conclusions

Full Results

Gradual Ramp Up

Rapid Ramp Up

Immediate High Demand

Stock Functions

Percentile Performance

Cold Starts

Conclusions

Add 2 Users per Second

Constant Load of 400 Concurrent Users

Constant Load of 1000 Concurrent Users

Azure Functions

AWS Lambda

Wrap Up and Next Steps

Contact

Recent Posts

Recent Tweets

Recent Comments

Archives

Categories

Meta