Monday, January 6, 2014

Coping with Christmas – a guest post from our web hosts at Mythic Beasts

Liz: Here’s a guest post from Pete Stevens at Mythic Beasts, our brilliant hosts. Christmas this year saw…a LOT of traffic, with thousands and thousands of downloads of NOOBS and other images from our downloads page, alongside pageviews in the many hundreds of thousands on the rest of the website – this on a day when we were expecting you all to be ignoring the internet and socialising with your families. Here’s how Mythic made everything work seamlessly. Thanks Pete, Liam et al!

In our last guest post we explained how we’d expanded out the Raspberry Pi hosting setup to cope with denial of service attacks. Since then we’ve done some more work to support NOOBS and cope with a traffic load that’s a bit larger than the main Internet Exchange in Leeds, and a bit smaller than the one in Manchester.

The story starts in July with some updates to support NOOBS Lite.

NOOBS

Our existing mirror setup wasn’t really suitable for supporting NOOBS Lite. Liam Fraser has written explaining why, and, after a bit of discussion, we came up with an alternative. The new plan allowed the coordinated release of the software and all dependent files to every server, but involved some careful decision-making when it came to the serving process.

We came up with these possible configurations:

  1. Serve IPv6 and directly connected networks from Mythic Beasts, the rest from Velocix
  2. Serve everything from Mythic Beasts
  3. Serve everything from Velocix failing all IPv6 traffic back to IPv4
  4. Serve everything from Cloudfront (and pay Amazon over $10,000 per month)

We were anxious to avoid option #4.

I gave Liam a quick specification to implement a redirector. Here it is in pseudo-code:

if (mode=Velocix) { redirect to velocix};  if (mode=Mythic} { serve file};  if (mode=Mixed} { if (ipv6) { serve file; } else if (in the list of networks directly connected to Mythic) { serve file }; else { redirect to velocix;} }

I also gave him some code that would query our routers and build the list of directly connected networks, e.g.

86.128.0.0/10 British Telecom (4 million customers)  86.0.0.0/11 Virgin Media (2 million customers)  82.0.0.0/11 Virgin Media (2 million customers)  31.64.0.0/11 Everything Everywhere (2 million customers)  …and another 107,000 entries which would be too long to put in this blog post.

Liam created a VM for the new downloads service and implemented it exactly as specified. We started testing this code – and then this happened:

There were two problems (aside from the Liz's obviously unreasonable expectation that Liam had to finish everything before he went to university). [Liz interjects: Pete is joking. Liam is a superb multitasker.] Firstly, the code that worked out where to redirect to wasn't particularly fast, and was running on a little single core VM. This caused a bottleneck. Secondly, we were using the service from Velocix designed for large files, so we issued a redirect to Velocix, they issued a redirect to the closest cluster of theirs and then the file was delivered. All well and good. But for the small files, the redirects consumed more bandwidth and latency than the CDN saved so using the CDN slowed down the service. The result was that the NOOBS menu came up very slowly – painfully slowly if the mode was set to Velocix. Oops!

So we came up with a new plan, and simplified, it looked something like this:

Put into words: we installed a local webserver on each of the load balancers which had a full copy of all the downloads. We next wrote some Apache config dictating that if the file was less than 1MB in size we’d just serve it immediately (taking out the series of redirects and improving the performance). If it was larger than 1MB, we’d then consult our code to work out the best way of serving it. Lastly, we moved the redirector to the load balancers which gave an upgrade from a single-core VM to four dual-core servers, each with a 1Gbps upstream.

(Obviously, at this point the comments are going to fill up with lectures on how stupid we are for using Apache instead of ngnix, lighttpd, node.js, IIS, thttpd, $insert_webserver_of_choice etc. We’ve chosen to stick with apache because it made writing the setup easy, and the bottleneck was the network card in each machine, not the CPU load from Apache.)

The next trick, once we had enough capacity out of the hosting cluster, was to make sure we had enough capacity out to the internet. Mythic Beasts upgraded their London Internet Exchange upload rate to 2x10Gbps, and we sat back and waited.

On Christmas Eve the traffic levels started rising. Raspberry Pi went 50% over their previous record thanks to the new release of NOOBs v1.3.3, and, we assume, parents setting up Pis ready for the next day. The following morning, this happened (see if you can guess without looking at the legend which day is Christmas):

Raspberry Pi doubled the new record for traffic set 24 hours previously. They didn’t manage to become the busiest customer at Mythic Beasts – that record is still held by a music streaming network – but did exceed the internet exchange for Leeds (IX-Leeds) and nearly caught the main exchange in Manchester (IX-Manchester).

On Christmas Day we all sat back with dinner, a glass of something and the odd glance at the graphs and Twitter, happy that all the new Raspberry Pi users had a seamless experience and the last six months of work and upgrades had been worth the time we'd put in. [Liz interjects again: I also spent Christmas Day checking the graphs and Twitter occasionally. The open mouth that the data occasioned made for terrific brussels sprouts target practice for everyone else.]

You can follow @Mythic_Beasts and @FraserLiam on Twitter if you find this sort of thing as interesting as we do. (And if you haven’t already worked out how to follow @Raspberry_Pi by now, you really should be doing.)

No comments:

Post a Comment