GitHub Actions down again today
githubstatus.com618 points by cebert 5 hours ago
618 points by cebert 5 hours ago
We’ve had GitHub actions for long enough, it’s time for GitHub consequences.
I think this can only happen if there are viable alternatives.
For instance, the UI at setups such as https://git.devuan.org/Daemonratte/gtk2-ng is quite ok-ish, in my opinion. Granted, it is mostly copy/paste from github but that still is about 1000000x better than sourceforge's interface - and gitlab's UI too (I just hate gitlab's UI, they seem to love complexity and a billion features only 0.000001% ever need; GitHub, with all its faults, is for the most part really simple - not everywhere, e. g. GitHub wiki setup sucks, but by and large I think it is simple overall).
If replacing github wholesale isn't viable, how does the story for replacing GitHub Actions look like currently? I don't remember the pre-Github-Actions days of everyone using CircleCI with a github integration in a negative light. I've noticed that since then a couple of CI providers have sprung up that differentiate themselves with faster build speeds, but I haven't really kept up with that market
Github definitely has the better UI but if it weren't for network effects I'd be pushing to migrate to Gitlab pretty hard.
Gitlab’s UI is extremely terrible. It’s hard to even explain how bad it is.
>Copilot: Do you want me to implement consequences for you or babble on and on about what might entirely be a figment of your imagination (Github is up and you're on a 48 hour bender without sleep)
My action failed with "Unexpected error fetching GitHub release for tag refs/heads/master: HttpError: Sorry. Your account was suspended"
Which certainly made me shit myself, briefly.
It's an eye opener. Think about it - today, it was a mistake. But, what if it really happened? What if you really lost access to all your years of hard work? It's a wake up call. A blessing in disguise to store what matters to you the most locally, backed up offline. Never trust any single provider. Be it MS or Google or Apple. RAID is the way.
People should use something that keeps a local copy of their code and just copies it to Github and to other contributors with a sync process to push and pull changes. Some sort of 'distributed source control system' maybe. Then people would only need a 'hub' to connect to people, and it'd be easier to move somewhere else.
I like how tech seems to be all about stacking more and more turtles on top of each other.
Gosh, it's hard figuring out what changes Lorne made if only we had a system to merge those changes. Enter git
Gosh it's hard figuring out sync these changes across a network. Enter github
Gosh it's hard figuring out what packages Rachel had to make this work. Enter npm
Gosh it's hard figuring out how to get those packages working on my operating system. Enter docker
Gosh centralizing our distributed version control software system onto one website is getting really unreliable. Enter fossil(?????)
All that shit gets kind of annoying guys, if we go any further having one computer per school with a sign up sheep is starting to sound pretty fucking attractive.
> Some sort of 'distributed source control system' maybe
The day it broke away and became centralized was when we had a PR + mandatory "Required actions" to merge to main.
What you just described is Fossil. It has an auto-sync feature that makes everything feel distributed.
Just set up a Kubernetes deployment and you’re set.
But as others mention, GitHub’s primary strength is collaboration. If you want decentralized, solve this by creating a decentralized collaboration tool on top of fossil and/or git.
For example, how to do pull requests and code reviews?
This gets tiresome. Github is a lot more than a host for Git repositories. If you want to suggest that people use something else, you need to suggest a replacement that has the features people use Github for.
I think you missed the joke, which is that the parent poster you're replying to is suggesting a 'solution' to the problem which evolved in complexity until he was just describing Github again.
Increasingly less and less so as they “upgrade” their offering and have more and more downtime.
yeah, #1, it is free private file storage, and #2, it's a download portal for free as in beer software replacing paid offerings. that's what it is for 99.99% of people.
being a host for git repositories has never been its core competency. neither has its groupware offering.
does it even serve OSS well? a very interesting criteria is, "Have mature or adopted end-user-facing OSS recently merged a large PR from an unallied contributor?" The answer is overwhelming no. This is why there is so much innovation in this space.
I recently got my GitHub account suspended for 4 months. When it was finally reinstated, their support just said it was a "mistake".
Proudly self-hosting Forgejo since then.
This happened to me as well—thankfully not my personal account that I use for work, but the organization associated with an open source project I worked on was suspended. It similarly took 2 months for GitHub to restore the organization.
> Our team is currently experiencing an unexpectedly high volume of tickets which has resulted in longer response times than we prefer. We acknowledge the long wait and apologize for the experience.
> Sometimes our abuse detecting systems highlight accounts that need to be manually reviewed. We've cleared the restrictions from your account…
Fully self-hosted IMO can be an overcorrection. The issue isn’t “relying on other people”—it’s relying on GitHub, when they’ve made it clear they don’t care about uptime and they don’t care about support turn-around-time.
RAID is not a backup.
They... Didn't describe RAID? More 3-2-1.
The last sentence in the comment is literally "RAID is the way".
I think they were intending to evoke the image of RAID rather than literally referring to a redundant array of inexpensive disks. You host your code on Github, Gitlab, and at home, then you survive a Github outage. It's a redundant array. Not sure it's inexpensive, though.
Well yes, my git repositories sit on my laptop, that's the entire point. If github banned my country because its president has a tis, I can push my entire commit history to another company. Same with anyone else who's working on it.
It would be a pain as I'd have to set up a few integrations again, but github is far lower down the risk scale than the vast majority of SAAS providers
Same. It's weird how I always find out that GitHub is down before GitHub does. Took 15 minutes before it appeared on githubstatus.com
All these monitoring rules are of the format "when 500 errors > baseline for x minutes". Otherwise you'd have monitoring alerts every second. So it is normal for users to already see errors before github officially counts it as an outage.
> All these monitoring rules are of the format "when 500 errors > baseline for x minutes". Otherwise you'd have monitoring alerts every second. So it is normal for users to already see errors before github officially counts it as an outage.
Is it true that official service status pages are updated automatically?
> it true that official service status pages are updated automatically?
Depends. Typically no because there’s an art to crafting the actual message around impact… but sometimes yes it is automated
You'd expect them to be monitoring more than just the HTTP response codes from user requests for precisely this reason.
If the first they hear of an outage is when user requests start to fail, then that's a failure in their monitoring as well.
But effective monitoring is harder than people assume.
> If the first they hear of an outage is when user requests start to fail, then that's a failure in their monitoring as well.
Isn't that what monitoring actually is? The issue seems to be in their testing, not monitoring.
No, monitoring for HTTP response code is a subset of observability and not one that generally gives you the best insights into which subsystems are misbehaving nor why.
There are synthetic tests, where you can generate API request calls or even simulate an entire user journey. These allow you to control the user agent, the payloads, and thus you know anything errors back are actual errors. These are triggered by the observability platform (think like running a cron-job) and thus you're not tied to user activity to see when problems arise.
There are other metrics outside of HTTP response codes too. Think like free RAM, CPU usage, disk space, etc. This is just naming some obvious ones because these types of metrics are generally bespoke to the type of application your monitoring. And with these types of monitors, you'd not just have an alert when things have failed, but ideally have alerts when an irregular trend is showing that things are likely to fail too. This latter type of monitors helps you get ahead of the problem before it become customer facing.
Then you have more traditional stuff like logs. This will also be bespoke to the application. But you'd expect errors in logs to get surfaced quickly. Assuming Github have good hygiene in what's being logged.
Tie that up with APMs, RUM, and other goodies like that and you'll have diagnostics to investigate issues when they appear.
(this is just a super high level view of observability too)
Even a synthetic probe needs a few failures to trigger an alert.
You should not alert on cpu, ram, etc
> Even a synthetic probe needs a few failures to trigger an alert.
It doesn't "need" that. That just how most people set it up because it’s an easy sane default that allows for network jitter without inexperienced engineers thinking about different conditions triggering different types of responses.
If you’re measuring internal APIs from an observablity solution that’s has nodes already inside you’re network enclave, then there is a strong argument for alerting early.
> You should not alert on cpu, ram, etc
That’s not true to say as an absolute statement. And a generalisation it heavily depends on the system your monitoring and how it behaves under pressure.
But in any case, I wasn’t suggesting CPU alerts were the end goal. I said:
> these types of metrics are generally bespoke to the type of application your monitoring.
Ie you’ll use metrics but those metrics will be highly specific.
The CPU examples were an illustration as to what a “metric” is (it might seem obvious but not everyone is an expert) but the point was HTTP response codes aren't the only types of metrics one should be capturing and watching.
Ah, yes, I misunderstood. And I have seen cases where a direct CPU alert makes sense, but 99 times out of 100 times I see it, it's nothing but trouble. Worse, I tend to see the cpu alert when there are no end to end synthetic alerts, 500 alerts, queue depth alerts, etc.
If your requests are fast and cheap, you can probe frequently relative to your goals, but often that's not really possible (think, long SQL queries, or scheduling a container/pod). There you need several datapoints, or possible fewer augmented with other signals.