GCP is incredibly bad at communicating when there are problems with their system...

timdorr · on June 2, 2019

AWS is often the same way. No one seems to be good at communicating outage details.

TallGuyShort · on June 2, 2019

I suspect there's a correlation between outages that are easy to detect and communicate and outages that automation can recover from so easily that you hardly notice.

obeattie · on June 2, 2019

I really don’t get this. There’s a huge number of complaints about poor communication from companies like Google and AWS during every outage. Yet they remain seemingly indifferent to how much customer trust they are losing, and the competitive edge the first one to get this right could gain.

sakisv · on June 2, 2019

I don't think they are losing any kind of customer trust.

Unless something is really fucked (like both GCP and AWS being down for us-east) incidents like these are not going to impact them at all.

The cost of either migrating to the other provider or, even worse, migrating to more traditional hosting companies is enormous and will require much more than "service was down for 2 hours in 2019". The contracts also cover cases like this and even if they don't, Google and Amazon can and will throw in some free treat as an apology.

On one hand I find this quite sad, but from a pragmatic point of view it makes sense.

atmosx · on June 2, 2019

If 20% of Google Cloud's customers leave after this outage because of poor communication they'll prioritise accordingly and apply all that nice SRE theories to their infra. But this isn't happening, because <various reasons>, so... who cares?

obeattie · on June 3, 2019

I mean, I care. All else being equal I’m not sure why you wouldn’t want good communication to your customers.

manigandham · on June 3, 2019

How much cloud spend do you control? That's the reality of how decisions are made.

obeattie · on June 3, 2019

Many millions of dollars per year. I care about how my providers behave when they have issues, and I can't see why you think it's not at all relevant.

manigandham · on June 3, 2019

> "why you think it's not at all relevant"

Nobody said this.

> "I care about how my providers behave when they have issues"

We all do.

As the other commenters stated, the communication is poor because the clouds are still growing rapidly and there's not much reason to be better. We might also be underestimating just how much more better service would cost and whether it's worth the revenue loss (if any). Are you really going to shift all of your spend overnight because of an outage? And where are you going to go?

The reality of these decisions is far more nuanced than it may seem and the current state of support is probably already optimized for revenue growth and customer retention.

jjeaff · on June 2, 2019

Their dashboard does show red on GCE and networking right now, for what it's worth. https://status.cloud.google.com/

hinkley · on June 2, 2019

What aren’t these on separate systems? I never had the impression that google cheaps out on things but this sounds exactly like the sort of shit that happens when people cheap out. Not even a canary system?

pm90 · on June 3, 2019

The idea that Google spends big on expensive systems is a huge lie.

Google started using a Beowulf cluster that the founders wired themselves. From the very beginning, the goal of metrics collection was to optimize costs. While today it’s seen as the cash cow, the focus has always been on cheap components strung together, relying on algorithms and code for stability and making the least possible demands of underlying hardware.

To think that they won’t try to save money any time they can seems implausible.