As much as we’d love to be perfect, we’re the first to admit that we’re not. Just like any other technology-based company, it’s inevitable that sometimes things go wrong. We always strive to give the best possible service no matter what the situation, and we never stop working to improve our services and platforms. Here’s an insight into what happens during downtime, why it happens, and what to do if you’re affected.
What happens at Heart Internet during downtime?
1. Something unexpected happens.
2. Our sysadmins immediately start investigating the problem.
3. In many cases, the issue only takes a few minutes to resolve.
4. If the issue takes longer to troubleshoot, updates are added to the status page when/if there's anything new to report.
5. Once the problem is resolved, the status page is updated to reflect that.
So, what causes downtime?
A million and one factors can cause downtime, and in many cases it’s an external factor. An ISP such as BT or Virgin Media may be having problems with their own network, and in some cases, this can mean that only sites hosted by a particular web host are affected (purely based on location). This means you may be able to access Twitter, Google, etc., but not your website, and assume it’s a problem with your site being down at our end. In fact, someone using a different, unaffected ISP may have no problems at all, which is why if we don’t have the issue reported on our status page, we will ask you to try a different internet connection if possible (such as a mobile one). In these cases we do what we can and ensure full communication with the company in question, but there’s often frustratingly little we can do until the third party resolves the issue.
In other cases, there may be hardware issues. Just like your computer can run slowly and needs to be restarted, servers need to be rebooted from time to time – that’s just the nature of technology. In these situations, websites will be down for just a few minutes whilst the server is taken offline and then comes back up. If you see a message on the status page saying that a server you’re on has been rebooted, you can reassure your customers or visitors that your website will be back very shortly. In rarer circumstances, hardware may need to be replaced. This is carried out as quickly and efficiently as possible at the most appropriate time (if we can, we’ll do it between 2 and 4am) to minimise downtime.
Occasionally, a customer may have a rogue script or website on our shared hosting platform which may cause unexpected issues for other users on that server. In these instances we turn the website off and notify the customer to avoid other sites being affected further, and your website is unlikely to be down for more than a few minutes.
Sometimes random situations can happen, where a server behaves in a strange way for no apparent reason. This can lead to short spates of downtime over a period until it’s resolved, and is often nearly impossible to get to the root cause of quickly as behaviour is unexpected and unpredictable. During this time our team does everything they can to investigate and isolate the problem.
Other situations can be a lot more complex. With continual updates across our the platforms and systems, occasionally actions have unexpected consequences despite our rigorous testing protocol. Almost all of our systems, platforms and networks are custom-built and are constantly being refined and developed, so there is no ‘If a) happens, use b) to fix it’ how-to guide. This is where a lot of web hosts fall apart, because there’s a culture of blame and finger-pointing. One of the reasons we have a lot less downtime than other hosts and resolve it much faster is because our teams work really well together at fixing problems without blame or grudge-holding (that and the fact that we have a very solid foundation – our systems and platforms are extremely well-built by some of the industry’s best). Rather than trying to prove who’s to blame, the sysadmins get on with fixing the problem at hand – no mean feat under so much pressure from so many people.
Those are just a few of the situations that lead to downtime; in reality there are an infinite number of potential issues with varying degrees of complexity. This is why our uptime is generally extremely high considering the number of things that can potentially go wrong (as with any web hosting platform anywhere in the world); customers count the minutes their site is down, rather than the minutes their site is up. It’s not the easiest industry to work in by any stretch of the imagination.
Imagine having a problem and not being able to Google it or ask anyone for the answer. Now imagine your problem potentially affects hundreds of thousands of people on millions of pounds’ worth of hardware on an extremely complex network affected by endless internal and external factors. On top of that, add intense internal and external pressure from staff and customers to investigate the issue and resolve it as quickly as humanly possible. That’s what our sysadmins potentially face every hour of every day (the team as a whole works 24/7 every day of the year), in addition to their day-to-day workload.
We can get thousands of messages from customers across support tickets, social media channels, and through our Sales line in any one day if something out of the ordinary occurs. This pressure filters through the company and down to Sysadmin, who bear the brunt of it whilst dealing with the issue at hand. It’s not an easy job, and requires a specific kind of person – highly intelligent, genuinely keen to get the problem resolved, suited to high-pressure environments, and not likely to become a serial killer.
We get a lot of questions asking about the cause of downtime, even after the problem is resolved. Generally we don’t give many details, and that’s a conscious decision, mostly for security reasons, but also because a lot of situations are insanely complex and difficult to communicate in a way that makes any kind of sense…!
How do I figure out if downtime is affecting me, and what do I do if I think my website is down?
Your first port of call should always be our system status page, which is where any issues will be listed. Occasionally you may have to wait a few seconds for the initial update to be posted, because our sysadmins are troubleshooting whilst updating the system status. If you’ve waited a minute or two and there’s nothing on there which fits your web server or situation, it’s best to raise a ticket with our support team as it’s likely to be a problem specific to you.
If there is an issue listed on the status page, it will mention which server or servers are affected. To find out what server your website is on, log in to your web hosting control panel and look ‘Web Server’ under ‘Account Info’ in the sidebar to see which server you’re on.
If your website is affected, keep checking the status page for any updates – it’s simply a case of waiting for the issue to be resolved by our team. If there are no updates, there’s simply nothing new to report – it hasn’t been forgotten about, it just means there are lots of people working to resolve the issue first, and thinking about the status page second.
There’s no need to raise a ticket with our support team if the problem is reported on our status page; it is kept updated with all the latest details straight from the people working on the issue. Likewise for social media messages; we can only repeat what the status page says and assure you that we’re doing absolutely everything we can. We hate downtime just as much, if not more, than you do. We hate feeling that we’re letting you down, we hate unhappy customers, and we hate unavailable websites. We do whatever it takes to resolve issues as quickly as possible, and we genuinely care more than anyone that your website is online and you/your customers are happy.
We hope this provides some clarity and insight, and we’d like to take this opportunity to thank you for your support, patience and understanding during these inevitable situations – it honestly means everything to us.