Even better IMO is this status page: https://mrshu.github.io/github-statuses/ "T...

montroser · 2026-03-31T19:46:13 1774986373

It has been pretty rough. Their own numbers report just a single `9` for Actions in Feb 2026 with 98% uptime. But that said -- I don't get the 90% number.

Anecdotally, it seems believable that 1 in 50 times (2%) in Feb that Actions barfed. Which is not very nice, but it wasn't at 1 in 10 times (10%).

verdverm · 2026-03-31T19:56:54 1774987014

It looks like the aggregate stats are more of a venn diagram than an average. So if 1/N services are down, the aggregate is considered down. I don't think this is an accurate way to calculate this. It should be weighted or in some way show partial outages. This belief is derived from the Google SRE book, in particular chapters 3 (embracing risk) and 4 (service level objectives)

https://sre.google/sre-book/embracing-risk/

https://sre.google/sre-book/service-level-objectives/

ablob · 2026-03-31T20:33:44 1774989224

If you're using all services, then any partial outage is essentially a full outage. Of course, you can massage the numbers to make it look nicer in the way you described but the conservative approach is better for the customers. If you insist, one could create this metric for selected services only to "better reflect users".

That being said, even when looking at the split uptimes, you'd have to do a very skewed weighting to achieve a number with more than one 9.

verdverm · 2026-03-31T21:08:11 1774991291

> That being said, even when looking at the split uptimes, you'd have to do a very skewed weighting to achieve a number with more than one 9.

It's definitely bad no matter how it you slice the pie.

If GH pages is not serving content, my work is not blocked. (I don't use GH pages for anything personally)

marcosdumay · 2026-03-31T20:34:42 1774989282

That's how you count uptime. You system is not up if it keeps failing when the user does some thing.

The problem here is the specification of what the system is. It's a bit unfair to call GH a single service, but it's how Microsoft sells it.

tbossanova · 2026-04-01T04:32:37 1775017957

As a “customer”, I consider github down if I can’t push, but not down if I can’t update my profile photo (literally did this today, sending out my github to potential employers for the first time in a long time). This stuff is notoriously hard to define

verdverm · 2026-03-31T21:03:28 1774991008

> That's how you count uptime.

It's not how I and many others calculate uptime. There is not uniformity, especially when you look at contracts.

bandrami · 2026-04-01T01:51:43 1775008303

Thinking back to when I was hosting, I think telling a customer "your web server was running fine it's just that the database was down" would not have been received well.

mort96 · 2026-03-31T20:27:56 1774988876

I mean I think it's useful. It answers the question, "what percentage of the time can I rely on every part of GitHub to work correctly?". The answer seems to be roughly 90% of the time.

verdverm · 2026-03-31T20:35:57 1774989357

I don't use half of the services, the answer is not straight forward

https://mrshu.github.io/github-statuses/

naniwaduni · 2026-03-31T20:33:29 1774989209

Nobody cares about every part of GitHub working correctly. I mean, ok, their SREs are supposed to, but tabling the question of whether that's true: if tomorrow they announced a distributed no-op service with 100% downtime, you should not have the intuition that the overall availability of the platform is now worse.

formerly_proven · 2026-03-31T20:47:44 1774990064

In a nutshell, why would the consumer care (for the SLO) care about how the vendor sliced the solution into microservices?

verdverm · 2026-03-31T20:59:43 1774990783

It will depend on the contract.

When I was at IBM, they didn't meet their SLOs for Watson and customers got a refund for that portion of their spend

fontain · 2026-03-31T19:46:38 1774986398

An aggregate number like that doesn’t seem to be a reasonable measure. Should OpenAI models being unavailable in CoPilot because OpenAI has an outage be considered GitHub “downtime”?

mort96 · 2026-03-31T20:29:55 1774988995

As long as they brand it as a part of GitHub by calling it "GitHub Copilot" and integrate it into the GitHub UI, I think it's fair game.

jasomill · 2026-04-01T15:56:04 1775058964

The third-party aspect is irrelevant, but while high downtime on any product looks bad for the company and the division, I consider GitHub Copilot an entirely separate product from GitHub, and GitHub Copilot downtime doesn't interfere with my use of GitHub repos or vice versa, so I'd consider its downtime separately.

GitHub Actions, on the other hand, is frequently used in the same workflows as the base GitHub product, so it's worth considering both separately and together, much like various Azure services, whereas I see no reason at all to consider an aggregate "Microsoft" downtime metric that includes GitHub, Azure, Office 365, Xbox Live, etc.

The most useful, metric, actually, is "downtimes for the various collections of GitHub services I regularly use together", but that would obviously require effort to collect the data myself.

mort96 · 2026-04-01T17:05:02 1775063102

My use of GitHub is like yours; I depend on Actions, but I couldn't give less of a damn about Copilot. However, Microsoft has tried to get people to adopt Copilot-heavy workflows, where Copilot plays an integral part in the pull request review process. If your process is as Microsoft pushes for -- wait for Copilot to comment, then review and resolve the stuff Copilot points out -- then Copilot being down means you can't really handle pull request, at least not in accordance with your standard process. For people who embrace Copilot in the way Microsoft wants them to, a GitHub Copilot outage has a serious impact on their GitHub experience.

mememememememo · 2026-03-31T20:34:21 1774989261

What is Google's uptime (including every single little thing with Google in the name)?

mort96 · 2026-03-31T20:45:07 1774989907

I don't think that's a fair comparison. Google Maps, Google Calendar, Google Drive, Google Search, Google Chrome, Google Ads, etc. are all clearly completely different products which have very little to do each other, they're just made by the same company called Google.

GitHub is a different situation. There's one "thing" users interact with, github.com, and it does a bunch of related things. Git operations, web hooks, the GitHub API (and thus their CLI tool), issues, pull requests, Actions; it's all part of the one product users think of as "GitHub", even if they happen to be implemented as different services which can fail separately.

EDIT: To illustrate the analogy: Google Code, Google Search and Google Drive are to Google what Microsoft GitHub, Microsoft Bing and Microsoft SharePoint are to Microsoft.

Kaliboy · 2026-03-31T21:00:20 1774990820

Completely agree, it makes it worse actually as Github's secondary functions so to speak are things we implicitely rely on.

When I merge to master I expect a deploy to follow. This goes through git, webhooks and actions. Especially the latter two can fail silently if you haven't invested time in observation tools.

If maps is down I notice it and immediately can pivot. No such option with Github.

dogma1138 · 2026-03-31T21:38:23 1774993103

It depends, for example - I would consider Google Drive uptime as part of say Google Docs’ overall uptime because if I can’t access my stored documents or save a document I’ve been working on for the past 3 hours because Drive is down I would be very pissed and wouldn’t care if it’s Drive or Docs that is the problem underneath I still can’t use Google Docs as a service at that point.

fwip · 2026-03-31T20:06:45 1774987605

I think reasonable people can disagree on this.

From the point of view of an individual developer, it may be "fraction of tasks affected by downtime" - which would lie between the average and the aggregate, as many tasks use multiple (but not all) features.

But if you take the point of view of a customer, it might not matter as much 'which' part is broken. To use a bad analogy, if my car is in the shop 10% of the time, it's not much comfort if each individual component is only broken 0.1% of the time.

remus · 2026-03-31T20:33:14 1774989194

> But if you take the point of view of a customer, it might not matter as much 'which' part is broken. To use a bad analogy, if my car is in the shop 10% of the time, it's not much comfort if each individual component is only broken 0.1% of the time.

Not to go too out of my way to defend GH's uptime because it's obviously pretty patchy, but I think this is a bad analogy. Most customers won't have a hard reliability on every user-facing gh feature. Or to put it another way there's only going to be a tiny fraction of users who actually experienced something like the 90% uptime reported by the site. Most people are in practice are probably experienceing something like 97-98%.

fwip · 2026-03-31T20:59:19 1774990759

Sorry, by 'customer' I meant to say something like a large corporate customer - you're buying the whole package, and across your org, you're likely to be a little affected by even minor outages of niche services.

But yeah, totally agree that at the individual level, the observed reliability is between 90% and 99%, and probably toward the upper end of that range.

wang_li · 2026-03-31T20:56:50 1774990610

A better analogy is if one bulb in the right rear brake light group is burnt out. Technically the car is broken. But realistically you will be able to do all the things you want to do unless the thing you want to do is measure that all the bulbs in your brake lights are working.

Dylan16807 · 2026-03-31T22:13:07 1774995187

That's an awful analogy because "realistically you will be able to do all the things you want to do". If a random GitHub service goes down there's a significant chance it breaks your workflow. It's not always but it's far from zero.

One bulb in the cluster going out is like a single server at GitHub going down, not a whole service.

mememememememo · 2026-03-31T20:35:36 1774989336

Or if your kettle is not working the house is considered not working?

Polizeiposaune · 2026-03-31T22:12:16 1774995136

I've been on a flight that was late leaving the gate because the coffeemaker wasn't working.

skipants · 2026-03-31T19:45:01 1774986301

These are two pages telling two different things, albeit with the same stats. The information is presented by OP in a way to show the results of the Microsoft acquisition.

goodmythical · 2026-04-01T01:37:21 1775007441

holy shit that's nearly five weeks of down time.

Well, I mean, I guess that's fair really. How long has github been around? Surely it's got five weeks of paid time off by now...