Proverbial 'software bug' sent a spiral of bad configurations to other systems

Following one of the longer cross-service outages for Google in recent memory, the search and software giant sent out an apology and explanation for today's occurrences. According to the Official Google Blog, an internal system that sends out configuration information for systems beyond it encountered a software bug that sent out incorrect commands to several areas.

It only took from 10:55am PT when the bug was first seen to 11:02am when users began seeing massive outages in Gmail, Google+, Drive and other services. Roughly 12 minutes later while engineers were still in the process of figuring out what was happening, the initial system that sent out the bad information had self-corrected and began to properly configure other systems. Google claims nearly all users' services were back up and running by 11:30, which seems to be consistent with the general consensus among users.

As you would expect, the post gives some detail on what's being done to prevent this from happening in the future. More checks are being put in place so that improper configurations, if they are generated by bugs, aren't so easily sent out to other systems. Additionally, Google plans to improved targeted searching for issues during service failures.

Needless to say, we don't think we're going to be seeing outages like this with any higher frequency than we already experience now.

Source: Google Blog


Reader comments

Google explains reasons behind today's 30-minute service outage


The funny thing is that while Google Services were down,Google's engineering reliability team were having a AMA on Reddit XD

Posted via Android Central App from Nexus 7 2013

Had this been Verizon or some other corporation their first instinct would be not to say anything at all or deny that it was happening or happened.

Its very impressive to bring everything back up in 30 minutes. At my work, outages takes several hours to get fully functional.

Posted via Android Central App

At my work it takes 30 min for the IT to understand there is an outage going on from the time IT is notified.

No, not impressive.

Read carefully. The system sending configuration SELF CORRECTED.

That's geek speak for "we have no clue what went wrong and we didn't do anything to fix it".

They are still trying to figure it out.

My bet is they were pwoned while bragging on reddit about how invulnerable their system is, but the hacker made his point and decided to disconnect and live another day.

Probably just sergi using the google servers to recreate our entire universe with minecraft and ran out of memory slowing down every google server.

Am I reading it wrong or does the first sentence make it seem like this occurrence was a long time? Cause to me, that ain't long at all...

Posted via Android Central App

It actually good that it wasn't that long.

Google just doesn't have long service outages (especially cross service.)

Posted via my thumbs and Google Keyboard.

Cross service outage usually means router problems.
Boarder gateway protocol updates gone horribly wrong or something, rather than actual server farms crashing.

I kept getting a new thing that I never saw before showing green bars on a google page telling me everything was ok with a small check list, first ever I had seen it and have never even heard of it.

....and again, this is why cloud services are not the answer.

If you have company of 500, 10 minutes of downtime is an eternity. Even for one, if you need Google Drive for a document, and it is not there you have an issue.

I am glad we are heading into the land where 32 GB is the standard, not the exception

Really? 32GB's, the standard? We'll see if that's the case when the *S5* comes out. But I agree, 16GB's need to be treated as more mid-range.

Posted using Android Central App on my Samsung Galaxy S4 T-Mobile

It's called a SD card, look into it. I'd rather pay less and get a 16gb model and expand via SD than pay a lot more than a 32gb SD price to only get an extra 16 gb.

Posted via Android Central App

That is more of a case of OEM charging too much for the extra 16GB. SDcards really should not be coming into play now, but they are a necessary evil for a lot us still...

Not all devices have an SD card slot. My Nexus S lacks one (as do pretty much EVERY Nexus device)

Even in a company of 500 people, there's a good chance you can't afford to run your own servers and build (or pay to license) a software ecosystem that's as advanced as Google's, even if it's just specific to the business you're doing.

And even if you did, I can bet a good amount of money that your uptime for the year wouldn't be as high as Google's is for Gmail and Drive.

Just an FYI as well, you can get both offline Gmail and offline Drive extensions from Google so you can manage your email and documents without an internet connection — whether that's because Google's down or just your ISP.

I beg to differ

MS Small Business Server with 1 dedicated "IT" guy can do it with fewer outages than what has gone on in the last few months. I would be happy to take that bet.

Offline services are fine for drive, you got me there, but incoming email? Not so much

Nice to see your up at this time of night...

Posted using Android Central App on my Samsung Galaxy S4 T-Mobile

Someone has to run that MS server... Lol.

Just kidding, I agree with you. Companies with 5 employees run their own servers, it's not that big a deal anymore.

Granted, I find no issue with a 30 minute outage by Google because it rarely, rarely happens.

Posted via Android Central App

I agree but I would rather trust my guys than 'the cloud'.

Seems as though the outages are getting more and more, even if they are short durations.

This space for SALE! BBM me #8675309

Yeah it kinda does and doesn't... Nothing gets by me anymore... Lol

This space for SALE! BBM me #8675309

You can either:
A) pay Google pennies to run the service and never have to worry about DR, storage, hardware, software licenses, updates etc.
B) Pay for an expensive IT team to run expensive Microsoft software on an expensive virtualised environment with all the inconvenience that comes with Active Directory, Exchange etc.

You don't gain a fix time advantage when something breaks by managing your own servers/applications.

Random real-world scenario:
User notices that Outlook isn't working at 10:00, calls IT at 10:05, IT scratches collective head for 10 minutes, someone calls the Exchange SME who proceeds to scratch head for a further 10 mintues, Exchange SME finds the fault and has to restart the server, the business has to be notified and the system has to be tested before the business can again be notified that its back up...

I work on an IT service desk and we have some super bright Windows/UNIX admins, but troubleshooting takes time and it's not always just a case of "turn it off and on again". We have more downtime from Microsoft software than we would have using Google's services.

I heart that Skynet was just running a test today. Not that Google bought the robot company and Nest, it's getting prepped to take over all the machines. ;)

Hate on what? Pfft!

This message was brought to you by the numbers 0 and 1

Dang, I missed it because I was asleep at my desk.

Posted from my "KNOX-FREE" 4.3 Sprint GS3 Maxx...!!!
(ZeroLemon 7000mah battery)

Ok, but you can't have my stapler...... I'm gonna burn this place down

Posted via Android Central best phone available: moto x

I work for a 147,000 person company with over 100 year of technology innovation and our It Department is huge.

On Monday we experienced what IT described as a small outage and from when they were first alerted to resolution was 2 hours.

Google probably services millions (maybe 10s of millions) and I would imagine has DATA centers 5x my employers and 30 minutes doesn't seem all that bad.

Where I give them extra credit was their transparency, explaining what went wrong and how they were taking steps to prevent such an issue in the future. Transparency is something Google has not been great about in the past and maybe just maybe the takeaway from this is not the outage but the start of a more transparent Google.

One can only dream,

Posted via Android Central App

I think we all can agree that we all lost a little bit of innocence with Gmail being down for short period of time.

Posted via Android Central App on BlackBerry Z30

It means on the grand scheme of things, life goes on. I work for a bank, and I get broadcast emails once a week about some system or another is down. If large bank with presence in Canada and US and over seas can make a healthy profit, our lives will largely be unaffected in the the long run.

Posted via Android Central App on BlackBerry Z30

Can't really be upset, how often do we have a problem like this with Google products

Posted via Android Central App

Google explains reason for outage. "Oops, we broke the internet, (insert stuff that you don't understand). Fixed now" Good enough for me....