A message to our customers - Heart Internet Blog - Focusing on all aspects of the web

On Wednesday 10th February, we suffered an interruption in power to our data centre facility during emergency works to fix a fault. This has caused disruption to customers’ services and our teams are working to resolve remaining issues.

We want to be open and transparent with our customers and give you the causes of the fault and take this opportunity to reassure all of our customers of our commitment to them.

 

Our data centre

Our UK-based data centre facility is one of the most efficient and resilient data centres in Europe. We carry out tests on a regular basis and test our generators every weekend to make sure that everything is in order. Additionally, we have expert technicians staffing our data centre 24/7, every day of the year. We have also have two UPS systems serving each data centre hall.

 

So, what happened?

  • On Wednesday, one of datacentre halls suffered a power loss which affected the facility for less than 9 minutes.
  • Each data centre hall has two UPS (uninterruptable power supplies) which feed into a LTM (load transfer module), which manages the feed of power from the UPS to the datacentre hall, where your servers are housed.
  • This piece of hardware automatically switches the power between the two external supplies, should one fail. This is part of our redundancy commitment to you.
  • The LTM showed a fault on the primary power supply and was running on its backup. Our teams followed the guidelines and contacted the manufacturer and an expert in this piece of hardware was sent to our facility to investigate and fix.
  • The fault code indicated that there was a problem with the voltage monitor. As a safety procedure, this automatically shuts down primary power, even though the power supply itself is likely fine.
  • The engineer on site assisted with the fitting of the replacement part. The procedure was then followed to turn off the already disabled switch and to change the part safely. Unfortunately, a safety mechanism in the device triggered incorrectly, which led to the data centre losing power.

 

What was at risk?

The safety of our teams is always our primary concern. Our data centre uses the same power as 150 homes all watching TV, with the heating on and the kettle boiling all at once. While a top priority is to keep your websites and services online, our highest priority is the safety of our engineers. When working with this kind of equipment, we have to ensure that everybody is safe. While we are extremely frustrated that the device triggered out incorrectly, a false positive is better than a risk to the life of our engineer. Finally, if the LTM is bypassed whilst it is worked on, there is a risk of a power surge that could irreversibly damage the contents of the datacentre.

 

 

Will this happen again?

A fault of this nature is almost unheard of and we will be working with the manufacture of our data centre hardware to ensure that this cannot happen again. We will, of course, keep you updated on any changes following our reviews.

 

We would like to apologise to all customers who have been affected and we appreciate your patience and we understand your frustrations. Our teams have been working around the clock to resolve all remaining issues, and will continue to until everything is restored to our normal high level of service. For regular updates from our support team, please visit: https://www.webhostingstatus.com/

Subscribe to our monthly Heart Internet newsletter, filled with the latest articles about web design, development, building your business, and exclusive offers.

Subscribe now!

Comments

Please remember that all comments are moderated and any links you paste in your comment will remain as plain text. If your comment looks like spam it will be deleted. We're looking forward to answering your questions and hearing your comments and opinions!

Leave a reply

  • Max C

    11/02/2016

    Hi HeartInternet,

    Thanks for the updates and transparency. This is the first major outage I’ve experienced with you ever, but we all know that things sometimes go wrong. It’s how we deal with them which makes the difference.

    Thanks 🙂

     
  • 11/02/2016

    Great update and frankly, I am still satisfied with you guys. First outage in my experience of being with you. Transparency is key. You have done that here.

     
  • 11/02/2016

    Many thanks for the update and the efforts of everyone at Heart for working overnight in difficult circumstances.

     
  • Malcolm Hollingsworth

    11/02/2016

    Why did you first announce a DDoS attack?
    Why has it taken so long to switch everything back on?
    Why have so many things you switched on – then failed and the process needed to start again?
    How come the information did not start flowing until it was demanded of you?
    I want less of a tech article excuse and more of a “we screwed up, this is how we will make it better” response.
    You said this sort of thing so rarely happens – yet failures are happening with an increasing frequency.
    I am asking these here as you never bothered to respond to an tweets.

     
  • Steve Green

    11/02/2016

    Probably more chance on winning the lottery than this happening again. My only concern was that after I spent 8 hours getting my vps’s back into working order, I then find my account suspended due to a payment not being processed last night by yourselves whilst your billing system was down. The same problem happened last June and nothing seems to have been done about it.

     
  • Gavin

    11/02/2016

    Hello Heart Internet.

    I called you this morning and explained the same as what I will publicly write here.

    I completely understand outages and issues. Sure, you have clearly had a problem here which, despite having a strong technical background, fail to understand why it took so long to restore.

    My issues with you, as it appears most of your customers, has been your communication to us (or lack of).

    All we wanted were honest, timely communications from you so that we could update our customers. You have failed me, my customers and all your other resellers badly here.

    PLEASE. Learn from this and make a commitment to give us better updates. You yourselves promised 20 minute updates, even this did not happen.

    I suggest that you have a way of communicating with at least your resellers so that they, who have chosen you as their provider, can give their customers meaningful updates.

    As I said, I understand issues and outages, sometimes avoidable, others not, but there is no excuse for your dreadful lack of comms during this issue.

    I sincerely hope you learn from this.

    Thank you.

     
    • Michael Shillingford

      15/02/2016

      Hi Gavin,

      Thanks for posting, and thanks very much for the feedback. We’ll soon be opening a new status page which will enable us to offer a more feature rich look at Heart Internet services. I agree that we need to implement a more direct line of communication with resellers and that’s something I’ll be working on in the near future. As for our updates, these were provided as they came to light and we’ll be taking a look at how to improve the frequency and quality of our comms when we perform a post-mortem on the outage.

       
  • Alan

    12/02/2016

    KVHOST 87 please fix mysql its 2 1/2 days, came back today for a while now its gone again

     
  • Steven

    12/02/2016

    3 days with my clients website down – that is a disgrace!

     
    • Michael Shillingford

      15/02/2016

      Hi Steven, these sites should now be back online. If you’re still having issues, please contact support here: https://bit.ly/HIhelp

       
  • 13/02/2016

    Hi Craig, Thanks for the explanation. It goes without saying that this type of failure is not a common event thankfully and the safety of your engineers should be your up most priority. I would like to ask you one thing. Earlier on Wednesday webhostingstatus.com was updated with a message about a DDOS attack but no mention of this later on the day or since. Can you confirm how services were affected before the power outage and if losing power helped to solve the DDOS attack?

    Many thanks

     
    • Michael Shillingford

      15/02/2016

      Hi Dean, I’m responding on Craig’s behalf here as he’s extremely busy working with our SysAdmin, Support and senior management teams to aid the remaining customers affected by the outage.

      Very shortly before the power outage, we came under a DDOS attack. This impacted some of our services but later turned out to be unrelated to the outage which, as you state, was caused by the incorrect triggering of a safety mechanism at our data centre. The two incidents are coincidental and unrelated. We’ll be performing a full post-mortem shortly and will make the results of that available to you.

       
  • Ron and the team

    13/02/2016

    Come on guys, I’m on your side but it’s hurting everyone now. 4th day in. I’ve just been reading the Facebook and trust pilot reviews. I don’t know if you can recover from this, but if you do, I’m there, albeit slightly worried about putting all my large clients on your servers.

     
    • Michael Shillingford

      15/02/2016

      Hi Ron, thank you for your support! We’ll do absolutely everything we can to justify it. Our primary focus is helping the last remaining customers experiencing problems to get back online. Once normal service resumes, we’ll perform a post-mortem and share that with our customers.

       
  • slendertoxtea

    13/02/2016

    we have been intermittently down since Wednesday and permanently down since midnight 13 02 2016.
    Whilst you have explained in depth the problems, that is no consolation to our business your putting all our jobs at risk please urgently fix the problem so we may start trading again. We pay for a premium service and we are certainly not getting it!!!
    600 sites are also on this premium platform we pay for and some of them are up and working so why are we still down for nearly on 9 hours??????

     
    • Michael Shillingford

      15/02/2016

      Hello, these sites should be back online now. If you’re still experiencing trouble, please contact support here: https://bit.ly/HIhelp

       
  • 13/02/2016

    Thanks for your honesty chaps. Safety first…glad that you managed to sort out 95% of the problems very quickly. Keep us posted as always.

     
    • Michael Shillingford

      15/02/2016

      Thanks for commenting, and we’ll keep you posted.

       
  • 13/02/2016

    Google have already emailed me about a site on one of the servers that has been down for 4 days now, saying that they will drop the url’s from search results. Not happy at all.

     
    • Michael Shillingford

      15/02/2016

      Hi Craig, I’m sorry that this has happened. The majority of services are back online now – if you’re still having trouble take a look at the Service Outage FAQ or contact support here: https://bit.ly/HIhelp

       
  • James

    16/02/2016

    I would echo other comments here and say that, the technical issues aside, Heart have let themselves down with their communications, both publically and via the ticketing system. We’re still awaiting a resolution to a fault with one of our VPS’s and have, as a result, put the brakes on an order for 2 hybrid servers whilst we consider our options. We are disappointed and feel let down.

     
  • richard

    16/02/2016

    Although all this happened – your number one priority should of been informing your customers instead of trying to dust it under the carpet. A lot of us small companies who rely on our websites being live have suffered. We have lost over 7 customers due to the downtime – it would be nice if we had some compensation for this? maybe a free couple of months?????????

    You are active on social media – but you left everyone hanging asking questions. It will be interesting to see how this will effect heart internet. I’m guessing a lot of your customers will be going elsewhere from now on

     
  • Graham Wilson

    17/02/2016

    It’s fine, don’t worry about it. I didn’t have a web site for forty years, so I’m well practised at it being offline 😉

    Glad there were no injuries to your people.

     

Comments are closed.

Drop us a line 0330 660 0255 or email sales@heartinternet.uk