As you’ve probably noticed, we’ve been heavily advertising our new cloud hosting platform recently. This is a great step forward for our hosting platform, but let me take the reins from the marketing department and focus on the technical side. It’s very easy to say “cloud this” and “cloud that” (indeed, everyone seems to!), but that doesn’t really mean anything without any technical backing. So, here it is – the Heart Internet Cloud Hosting technical explanation.
What does cloud even mean?
First, let’s start with the definition of cloud. There are lots of differing opinions on this, even in our own office, and even my own department… but after extensive discussion, we managed to boil it down to:
If the service is…
- On the Internet
- Distributed (and therefore highly available)
then we can consider it to be cloudy. Or in other words, a service which runs online and has no single point of failure. So let’s break each part down. We know that our services are on the Internet, so I’ll go straight onto attribute 2 – distribution.
A website is pretty much a combination of these things:
- Domain name
- Website files
- Something to serve it over the network (Internet)
You need a domain name so that there’s actually a way of getting to your website; the files of your website are your code, images, css, etc; the database is where your user logins or shopping basket items are stored; and then we have the webserver, the actual hardware and software combo which makes it all happen.
How is it assembled?
We've abstracted out each part of the platform, and then made each part distributed and auto failover. Our platform now consists of:
- Multiple DNS servers to serve DNS to website visitors.
- Multiple load balancers to answer HTTP(S) requests (and other services).
- Multiple webservers in clusters of 4 to serve the HTTP requests from the load balancers.
- Our NAS cut up into pairs of redundant storage to serve the webservers.
- Pairs of database servers with replication and redundancy to serve the webservers.
Every part is auto failover, so if a load balancer fails, another will take its place. If a webserver fails, its partners will serve its requests, if a storage node fails, its partner will take over its workload, and if a database serve falls over, its partner will answer its queries.
To achieve this, we’ve used quite a moderate collection of software, all of which is open source and runs on Linux.
What does each part do?
The star of the show is the load balancing software. This is what runs on our front end servers and all of the web traffic (and any external MySQL or SSH traffic) goes through them. They operate in pairs, with a failover/monitoring daemon making sure that its partner is OK. If for any reason it’s not (network problem, server crash, etc) it will take over its partner’s IP address and therefore take over its workload. In our tests, when this happens, an outage is typically less than 2 seconds.
Behind our load balancers are our webservers; these are the servers which serve the HTTP requests for your site, running any scripts or serving up static css, images or other content. These servers work in fours, and each server will answer requests for any website assigned to that cluster. If a webserver goes down (software crash, hardware failure, etc) then the load balancing software is smart enough to notice, and sends requests to the other healthy servers. This typically happens in about 1 second.
Behind our webservers are our storage nodes. These also work in partnership in a similar way to the load balancers. The inactive partner constantly monitors its partner's health, and if for any reason it becomes unhealthy, it will take over its partner's IP address and therefore workload. Quick-to-recover storage is very important (it can cause bottlenecks right up the rest of the stack), and failover takes about 5 seconds.
Also behind our webservers are our database servers. These are pairs of MySQL servers on very powerful boxes indeed, with block-level replication (essentially network RAID) in a master/slave scenario. Again, our own software is used for the failover (in a similar principle to our storage nodes), with an outage of little over 3 seconds for a successful failover.
That essentially covers our platform's structure: an abstraction of each basic element needed to serve a dynamic website, with redundancy and failover added to each element.
What other technologies are used?
We have various gigabit and ten-gigabit Ethernet networks connecting our storage, backend and frontend servers. We have our own mini-cloud of virtualised machines which scan mail in and out of the cloud (for viruses and spam), and we have additional fallback servers which can be attached and detached to/from any cluster to meet spikes in demand (eg, if a site is advertised on Dragon’s Den or if a site is attacked). And finally, sitting in front of it all are a set of redundant screening servers which protect the entire public network from DOS attacks, cracking attempts and other unwanted nasties.
So, there you have it – the Heart Internet Cloud Hosting platform explained. Any questions, please ask in the comments below!