GitHub Actions down again today

618 points by cebert 5 hours ago

We’ve had GitHub actions for long enough, it’s time for GitHub consequences.

thesurlydev - 3 minutes ago

"A little less conversation, a little more action please"
shevy-java - 35 minutes ago

I think this can only happen if there are viable alternatives.
For instance, the UI at setups such as https://git.devuan.org/Daemonratte/gtk2-ng is quite ok-ish, in my opinion. Granted, it is mostly copy/paste from github but that still is about 1000000x better than sourceforge's interface - and gitlab's UI too (I just hate gitlab's UI, they seem to love complexity and a billion features only 0.000001% ever need; GitHub, with all its faults, is for the most part really simple - not everywhere, e. g. GitHub wiki setup sucks, but by and large I think it is simple overall).
- wongarsu - 9 minutes ago
  
  If replacing github wholesale isn't viable, how does the story for replacing GitHub Actions look like currently? I don't remember the pre-Github-Actions days of everyone using CircleCI with a github integration in a negative light. I've noticed that since then a couple of CI providers have sprung up that differentiate themselves with faster build speeds, but I haven't really kept up with that market
- joshuanapoli - 8 minutes ago
  
  I've been thinking of reverting back to Circle CI.
- danudey - 31 minutes ago
  
  Github definitely has the better UI but if it weren't for network effects I'd be pushing to migrate to Gitlab pretty hard.
  - Bnjoroge - 26 minutes ago
    
    Gitlab’s UI is extremely terrible. It’s hard to even explain how bad it is.
pnvdr - 2 hours ago

i would like to see consequences for "secure sleep" XD.
cyanydeez - 2 hours ago

>Copilot: Do you want me to implement consequences for you or babble on and on about what might entirely be a figment of your imagination (Github is up and you're on a 48 hour bender without sleep)

My action failed with "Unexpected error fetching GitHub release for tag refs/heads/master: HttpError: Sorry. Your account was suspended"

Which certainly made me shit myself, briefly.

neya - 2 hours ago

It's an eye opener. Think about it - today, it was a mistake. But, what if it really happened? What if you really lost access to all your years of hard work? It's a wake up call. A blessing in disguise to store what matters to you the most locally, backed up offline. Never trust any single provider. Be it MS or Google or Apple. RAID is the way.
- onion2k - an hour ago
  
  People should use something that keeps a local copy of their code and just copies it to Github and to other contributors with a sync process to push and pull changes. Some sort of 'distributed source control system' maybe. Then people would only need a 'hub' to connect to people, and it'd be easier to move somewhere else.
  - marricks - 3 minutes ago
    
    I like how tech seems to be all about stacking more and more turtles on top of each other.
    Gosh, it's hard figuring out what changes Lorne made if only we had a system to merge those changes. Enter git
    Gosh it's hard figuring out sync these changes across a network. Enter github
    Gosh it's hard figuring out what packages Rachel had to make this work. Enter npm
    Gosh it's hard figuring out how to get those packages working on my operating system. Enter docker
    Gosh centralizing our distributed version control software system onto one website is getting really unreliable. Enter fossil(?????)
    All that shit gets kind of annoying guys, if we go any further having one computer per school with a sign up sheep is starting to sound pretty fucking attractive.
  - gopalv - 22 minutes ago
    
    > Some sort of 'distributed source control system' maybe
    The day it broke away and became centralized was when we had a PR + mandatory "Required actions" to merge to main.
  - fusishch - 43 minutes ago
    
    What you just described is Fossil. It has an auto-sync feature that makes everything feel distributed.
    Just set up a Kubernetes deployment and you’re set.
    But as others mention, GitHub’s primary strength is collaboration. If you want decentralized, solve this by creating a decentralized collaboration tool on top of fossil and/or git.
    For example, how to do pull requests and code reviews?
    
    40four - 4 minutes ago
    
    Why they just described is Git :) pretty sure it was a joke
  - coldpie - an hour ago
    
    This gets tiresome. Github is a lot more than a host for Git repositories. If you want to suggest that people use something else, you need to suggest a replacement that has the features people use Github for.
    
    danudey - 10 minutes ago
    
    I think you missed the joke, which is that the parent poster you're replying to is suggesting a 'solution' to the problem which evolved in complexity until he was just describing Github again.
    
    ornornor - an hour ago
    
    Increasingly less and less so as they “upgrade” their offering and have more and more downtime.
    
    doctorpangloss - 34 minutes ago
    
    yeah, #1, it is free private file storage, and #2, it's a download portal for free as in beer software replacing paid offerings. that's what it is for 99.99% of people.
    being a host for git repositories has never been its core competency. neither has its groupware offering.
    does it even serve OSS well? a very interesting criteria is, "Have mature or adopted end-user-facing OSS recently merged a large PR from an unallied contributor?" The answer is overwhelming no. This is why there is so much innovation in this space.
- mpaco - an hour ago
  
  I recently got my GitHub account suspended for 4 months. When it was finally reinstated, their support just said it was a "mistake".
  Proudly self-hosting Forgejo since then.
  - MatthiasPortzel - 18 minutes ago
    
    This happened to me as well—thankfully not my personal account that I use for work, but the organization associated with an open source project I worked on was suspended. It similarly took 2 months for GitHub to restore the organization.
    > Our team is currently experiencing an unexpectedly high volume of tickets which has resulted in longer response times than we prefer. We acknowledge the long wait and apologize for the experience.
    > Sometimes our abuse detecting systems highlight accounts that need to be manually reviewed. We've cleared the restrictions from your account…
    Fully self-hosted IMO can be an overcorrection. The issue isn’t “relying on other people”—it’s relying on GitHub, when they’ve made it clear they don’t care about uptime and they don’t care about support turn-around-time.
- corvad - an hour ago
  
  RAID is not a backup.
  - PokemonNoGo - an hour ago
    
    They... Didn't describe RAID? More 3-2-1.
    
    filleduchaos - an hour ago
    
    The last sentence in the comment is literally "RAID is the way".
    
    jrockway - 20 minutes ago
    
    I think they were intending to evoke the image of RAID rather than literally referring to a redundant array of inexpensive disks. You host your code on Github, Gitlab, and at home, then you survive a Github outage. It's a redundant array. Not sure it's inexpensive, though.
- iso1631 - 26 minutes ago
  
  Well yes, my git repositories sit on my laptop, that's the entire point. If github banned my country because its president has a tis, I can push my entire commit history to another company. Same with anyone else who's working on it.
  It would be a pain as I'd have to set up a few integrations again, but github is far lower down the risk scale than the vast majority of SAAS providers
grim_io - 4 hours ago

A brownout redefined.
- DonHopkins - 2 hours ago
  
  ShitHub
  https://www.youtube.com/watch?v=LGeOee7x5lY
- lachieh - 2 hours ago
  
  Good thing I'm wearing my brown pants today.
drcongo - 4 hours ago

Same. It's weird how I always find out that GitHub is down before GitHub does. Took 15 minutes before it appeared on githubstatus.com
- jaapz - 4 hours ago
  
  All these monitoring rules are of the format "when 500 errors > baseline for x minutes". Otherwise you'd have monitoring alerts every second. So it is normal for users to already see errors before github officially counts it as an outage.
  - logifail - 2 hours ago
    
    > All these monitoring rules are of the format "when 500 errors > baseline for x minutes". Otherwise you'd have monitoring alerts every second. So it is normal for users to already see errors before github officially counts it as an outage.
    Is it true that official service status pages are updated automatically?
    
    baby_souffle - 2 hours ago
    
    > it true that official service status pages are updated automatically?
    Depends. Typically no because there’s an art to crafting the actual message around impact… but sometimes yes it is automated
  - hnlmorg - 3 hours ago
    
    You'd expect them to be monitoring more than just the HTTP response codes from user requests for precisely this reason.
    If the first they hear of an outage is when user requests start to fail, then that's a failure in their monitoring as well.
    But effective monitoring is harder than people assume.
    
    dncornholio - 2 hours ago
    
    > If the first they hear of an outage is when user requests start to fail, then that's a failure in their monitoring as well.
    Isn't that what monitoring actually is? The issue seems to be in their testing, not monitoring.
    
    hnlmorg - 2 hours ago
    
    No, monitoring for HTTP response code is a subset of observability and not one that generally gives you the best insights into which subsystems are misbehaving nor why.
    There are synthetic tests, where you can generate API request calls or even simulate an entire user journey. These allow you to control the user agent, the payloads, and thus you know anything errors back are actual errors. These are triggered by the observability platform (think like running a cron-job) and thus you're not tied to user activity to see when problems arise.
    There are other metrics outside of HTTP response codes too. Think like free RAM, CPU usage, disk space, etc. This is just naming some obvious ones because these types of metrics are generally bespoke to the type of application your monitoring. And with these types of monitors, you'd not just have an alert when things have failed, but ideally have alerts when an irregular trend is showing that things are likely to fail too. This latter type of monitors helps you get ahead of the problem before it become customer facing.
    Then you have more traditional stuff like logs. This will also be bespoke to the application. But you'd expect errors in logs to get surfaced quickly. Assuming Github have good hygiene in what's being logged.
    Tie that up with APMs, RUM, and other goodies like that and you'll have diagnostics to investigate issues when they appear.
    (this is just a super high level view of observability too)
    
    lokar - 2 hours ago
    
    Even a synthetic probe needs a few failures to trigger an alert.
    You should not alert on cpu, ram, etc
    
    hnlmorg - 2 hours ago
    
    > Even a synthetic probe needs a few failures to trigger an alert.
    It doesn't "need" that. That just how most people set it up because it’s an easy sane default that allows for network jitter without inexperienced engineers thinking about different conditions triggering different types of responses.
    If you’re measuring internal APIs from an observablity solution that’s has nodes already inside you’re network enclave, then there is a strong argument for alerting early.
    > You should not alert on cpu, ram, etc
    That’s not true to say as an absolute statement. And a generalisation it heavily depends on the system your monitoring and how it behaves under pressure.
    But in any case, I wasn’t suggesting CPU alerts were the end goal. I said:
    > these types of metrics are generally bespoke to the type of application your monitoring.
    Ie you’ll use metrics but those metrics will be highly specific.
    The CPU examples were an illustration as to what a “metric” is (it might seem obvious but not everyone is an expert) but the point was HTTP response codes aren't the only types of metrics one should be capturing and watching.
    
    lokar - an hour ago
    
    Ah, yes, I misunderstood. And I have seen cases where a direct CPU alert makes sense, but 99 times out of 100 times I see it, it's nothing but trouble. Worse, I tend to see the cpu alert when there are no end to end synthetic alerts, 500 alerts, queue depth alerts, etc.
    If your requests are fast and cheap, you can probe frequently relative to your goals, but often that's not really possible (think, long SQL queries, or scheduling a container/pod). There you need several datapoints, or possible fewer augmented with other signals.