What happens during downtime, and what causes it? - Heart Internet Blog - Focusing on all aspects of the web

As much as we’d love to be perfect, we’re the first to admit that we’re not. Just like any other technology-based company, it’s inevitable that sometimes things go wrong. We always strive to give the best possible service no matter what the situation, and we never stop working to improve our services and platforms. Here’s an insight into what happens during downtime, why it happens, and what to do if you’re affected.


What happens at Heart Internet during downtime?

1. Something unexpected happens.

2. Our sysadmins immediately start investigating the problem.

3. In many cases, the issue only takes a few minutes to resolve.

4. If the issue takes longer to troubleshoot, updates are added to the status page when/if there's anything new to report.

5. Once the problem is resolved, the status page is updated to reflect that.


So, what causes downtime?

A million and one factors can cause downtime, and in many cases it’s an external factor. An ISP such as BT or Virgin Media may be having problems with their own network, and in some cases, this can mean that only sites hosted by a particular web host are affected (purely based on location). This means you may be able to access Twitter, Google, etc., but not your website, and assume it’s a problem with your site being down at our end. In fact, someone using a different, unaffected ISP may have no problems at all, which is why if we don’t have the issue reported on our status page, we will ask you to try a different internet connection if possible (such as a mobile one). In these cases we do what we can and ensure full communication with the company in question, but there’s often frustratingly little we can do until the third party resolves the issue.

In other cases, there may be hardware issues. Just like your computer can run slowly and needs to be restarted, servers need to be rebooted from time to time – that’s just the nature of technology. In these situations, websites will be down for just a few minutes whilst the server is taken offline and then comes back up. If you see a message on the status page saying that a server you’re on has been rebooted, you can reassure your customers or visitors that your website will be back very shortly. In rarer circumstances, hardware may need to be replaced. This is carried out as quickly and efficiently as possible at the most appropriate time (if we can, we’ll do it between 2 and 4am) to minimise downtime.

Occasionally, a customer may have a rogue script or website on our shared hosting platform which may cause unexpected issues for other users on that server. In these instances we turn the website off and notify the customer to avoid other sites being affected further, and your website is unlikely to be down for more than a few minutes.

Sometimes random situations can happen, where a server behaves in a strange way for no apparent reason. This can lead to short spates of downtime over a period until it’s resolved, and is often nearly impossible to get to the root cause of quickly as behaviour is unexpected and unpredictable. During this time our team does everything they can to investigate and isolate the problem.

Other situations can be a lot more complex. With continual updates across our the platforms and systems, occasionally actions have unexpected consequences despite our rigorous testing protocol. Almost all of our systems, platforms and networks are custom-built and are constantly being refined and developed, so there is no ‘If a) happens, use b) to fix it’ how-to guide. This is where a lot of web hosts fall apart, because there’s a culture of blame and finger-pointing. One of the reasons we have a lot less downtime than other hosts and resolve it much faster is because our teams work really well together at fixing problems without blame or grudge-holding (that and the fact that we have a very solid foundation – our systems and platforms are extremely well-built by some of the industry’s best). Rather than trying to prove who’s to blame, the sysadmins get on with fixing the problem at hand – no mean feat under so much pressure from so many people.

Those are just a few of the situations that lead to downtime; in reality there are an infinite number of potential issues with varying degrees of complexity. This is why our uptime is generally extremely high considering the number of things that can potentially go wrong (as with any web hosting platform anywhere in the world); customers count the minutes their site is down, rather than the minutes their site is up. It’s not the easiest industry to work in by any stretch of the imagination.

Imagine having a problem and not being able to Google it or ask anyone for the answer. Now imagine your problem potentially affects hundreds of thousands of people on millions of pounds’ worth of hardware on an extremely complex network affected by endless internal and external factors. On top of that, add intense internal and external pressure from staff and customers to investigate the issue and resolve it as quickly as humanly possible. That’s what our sysadmins potentially face every hour of every day (the team as a whole works 24/7 every day of the year), in addition to their day-to-day workload.

We can get thousands of messages from customers across support tickets, social media channels, and through our Sales line in any one day if something out of the ordinary occurs. This pressure filters through the company and down to Sysadmin, who bear the brunt of it whilst dealing with the issue at hand. It’s not an easy job, and requires a specific kind of person – highly intelligent, genuinely keen to get the problem resolved, suited to high-pressure environments, and not likely to become a serial killer.

We get a lot of questions asking about the cause of downtime, even after the problem is resolved. Generally we don’t give many details, and that’s a conscious decision, mostly for security reasons, but also because a lot of situations are insanely complex and difficult to communicate in a way that makes any kind of sense…!


How do I figure out if downtime is affecting me, and what do I do if I think my website is down?

Your first port of call should always be our system status page, which is where any issues will be listed. Occasionally you may have to wait a few seconds for the initial update to be posted, because our sysadmins are troubleshooting whilst updating the system status. If you’ve waited a minute or two and there’s nothing on there which fits your web server or situation, it’s best to raise a ticket with our support team as it’s likely to be a problem specific to you.

If there is an issue listed on the status page, it will mention which server or servers are affected. To find out what server your website is on, log in to your web hosting control panel and look ‘Web Server’ under ‘Account Info’ in the sidebar to see which server you’re on.

If your website is affected, keep checking the status page for any updates – it’s simply a case of waiting for the issue to be resolved by our team. If there are no updates, there’s simply nothing new to report – it hasn’t been forgotten about, it just means there are lots of people working to resolve the issue first, and thinking about the status page second.

There’s no need to raise a ticket with our support team if the problem is reported on our status page; it is kept updated with all the latest details straight from the people working on the issue. Likewise for social media messages; we can only repeat what the status page says and assure you that we’re doing absolutely everything we can. We hate downtime just as much, if not more, than you do. We hate feeling that we’re letting you down, we hate unhappy customers, and we hate unavailable websites. We do whatever it takes to resolve issues as quickly as possible, and we genuinely care more than anyone that your website is online and you/your customers are happy.

We hope this provides some clarity and insight, and we’d like to take this opportunity to thank you for your support, patience and understanding during these inevitable situations – it honestly means everything to us.

Subscribe to our monthly Heart Internet newsletter, filled with the latest articles about web design, development, building your business, and exclusive offers.

Subscribe now!

Comments

Please remember that all comments are moderated and any links you paste in your comment will remain as plain text. If your comment looks like spam it will be deleted. We're looking forward to answering your questions and hearing your comments and opinions!

Leave a reply

  • Jon

    16/10/2012

    Brilliant insight into how you operate. I new about the problem yesterday within minutes of it occurring and was proactive contacting site owners who have mission critical websites. Everyone completely understood and it didn’t cause any long-term issues. It was resolved quickly and actually made me even more confident in selling your services.

     
  • Richard

    16/10/2012

    Evasive and generic with no mention of yesterday’s downtime or any specifics. This is how outage reports should be done: https://status.heroku.com/incidents/151

     
  • 16/10/2012

    Thanks Jon! Glad to hear it 🙂

     
  • 16/10/2012

    Hi Richard,

    Thanks for your suggestion. It’s deliberately non-specific so we can direct people to it at any point in the future as an insight into how we operate.

    Jenni

     
  • Andrew

    16/10/2012

    Thanks for the info, Jenni. I think the fact the so many were frustrated by yesterday’s outages is testament to the excellent record for uptime that Heart boasts. Keep up the great work!

     
  • 16/10/2012

    Thanks for your support Andrew 🙂

     
  • keith stoddart

    16/10/2012

    Nice one Jenni and well done to Heart. Basically, these things happen and Heart handled it admirably. Blaming no one and just getting on with the job of fixing it. Fortunately, I hadn’t any complaints from my clients so maybe they missed it. However if I had any complaints I would have been happy to tell them exactly that these things happen (the nature of the beast!) and it is being fixed as we speak.

    Well done again to Heart and keep up the good work.

     
  • 16/10/2012

    Thanks Keith, we realised that we just tend to say ‘We’re fixing it’ and expect people to know what we’re going through!

     
  • Steve

    16/10/2012

    Thanks for the article, good read although I’m slightly confused as to why the DC image in the article appears to be that of Fasthosts (you can see the logo on the servers!).I didn’t think you were associated/used Fasthosts?

    Thanks

     
  • 16/10/2012

    Hi Steve,

    It was just a generic stock image that quite a few hosts and news sites use – no logos, just blue lights. We’ve replaced it anyway to avoid any confusion 🙂

    Thanks,

    Jenni

     
  • Paul Debnam

    16/10/2012

    Hi

    Very good article, I appreciate the sysadmins have a hell of a task- but do a really good job. The up to date information on the status page is invaluable allowing me to instantly reply to my clients who are having issues and update them as the issue progesses.

    keep up the excellent work!

    Paul

     
  • 16/10/2012

    Thanks Paul, I agree – they are so behind-the-scenes that a lot of people don’t realise they exist, and yet we wouldn’t have anything without their hard work!

     
  • Dave

    16/10/2012

    I started reading that outage report, but then I thought… why bother?

    If my site is down, all I need to know is that it’s being fixed. I don’t deal with any of the back-end systems so I don’t need to know what went wrong.

     
  • Artisan Internet

    16/10/2012

    Thanks Jenni, it’s always interesting to get some insight into how things work at Heart Towers, and I love the sysadmin/uptime cartoon!

    We sent a mailshot out to our hosting clients as soon as we knew about the problem, thereby keeping the number of emails from them down to two, saving ourselves a lot of wasted work time. Is it possible to get some kind of email feed from the status page?

    Server boxes are only computers, after all, and we all know what they’re like!

     
  • 17/10/2012

    I’ve been looking for an excuse to use that cartoon for ages! 🙂

    We’re currently looking at ways to improve the status page, so I’ll definitely add your suggestion to the list for our decision-makers.

    Thanks for your comment!

     
  • Ian

    17/10/2012

    Problems occur for any business in any niche around the world, but its about how you deal with them.

    We have over 300 sites with you guys, and EVERY ticket I have EVER place had been dealt with, within 15 minutes, most quicker unbelievably.

    Your tech guys goes way and beyond their call of duty, often helping with things that most other hosts would not even respond to.

    Your tech guys are your company and believe me, they are the best in the industry.

     
  • 17/10/2012

    Thanks Ian, no one cares more than our guys do so it’s fantastic to see that comes across. It’s definitely a challenging job at all levels and appreciation is always very welcome – I’ve passed your kind words on 🙂

     
  • Alan

    18/10/2012

    As a company that now maintains its websites in-house we have previously experienced the frustrations of a 3rd party provider telling us the problem is likely at our end when we can’t make changes to the websites.

    Since dealing direct with Heart Internet, we have saved a lot of time by quickly identifying any issues via their System Status link (I am not saying there are issues on a regular basis but when there are it is easy to keep repeating the same task, checking ftp details etc).

    If there is a problem with the web servers we can see it has been identified and know it will be dealt with as quickly as possible. We are then able to let everyone know so if a customer calls they can be informed of what’s happening plus we can go and get on with something else and check system status later.

    The suggestion above from Artisan Internet seems a good idea – an opt-in email would give a heads up. It’s sometimes good to have an answer as soon as the question is asked!

    Yes it was inconvenient the other day, and yes it was frustrating too but hey, it happens!

    I would just like to add though, that when I have had to contact Technical Support, they are first class – fast response and fast solutions.

    Well done all…

     
  • 18/10/2012

    Thanks for taking the time to write up your thoughts Alan – it’s great to get an insight into how people use our services.

    Thanks for your support 🙂

     
  • Davide

    22/10/2012

    I think that the simple solution to stop tickets and help us would be to send an email to Resellers and Clients….So that we know before our clients start ringing and saying “my site down”

    Many Thanks

    Davide

     
  • 22/10/2012

    Hi Davide,

    Thanks for your suggestion. It’s something we can consider, but most of the time it’s an issue that only lasts a few minutes, and it’s usually difficult (if not impossible) to tell how long it will take to resolve.

    Jenni

     
  • Davide

    22/10/2012

    Just a quick question…..

    My control panel does not look anything like the Image you have up there? Any reason….I do not have “Account info”

    Many Thanks

    Davide

     
  • 22/10/2012

    Hi Davide,

    Everyone who has eXtend can see that information, it is on the right hand side of the control panel. I’ve had a look at your control panel and it is there in the “Stats” section. The reason why yours looks a different is you are using a different design. We offer 3 choices which you control here https://customer.heartinternet.uk/manage/reseller-brand-n.cgi.

    Cheers

    Matt

     
  • Billy

    27/10/2012

    Great article. Being a technologist I understand things go wrong, there are blips and other issues, so no need to get flustered about it.

    In the almost three years I have been with HI the service has been flawless. As for support, I couldn’t give myself better service.

    All a far cry from the previous reseller host I used to used, U*2, (can you guess who it is yet??) To this day I am still getting 3 or 4 emails about outages, issues, faults etc etc etc.

    HI, keep up the good work, great service, great priced services and the innovative stuff you come up with.:-)

     
  • 29/10/2012

    Thank you so much Billy, that’s great to hear 😀

     
  • Paul Littlefield

    10/01/2015

    An excellent explanation, and one which I shall be leading people to myself!

     

Comments are closed.

Drop us a line 0330 660 0255 or email sales@heartinternet.uk