Modern CI is too complex and misdirected (2021)

192 points by thundergolfer a day ago

I remember a Rich Hickey talk where he described Datomic, his database. He said "the problem with a database is that it's over there." By modeling data with immutable "facts" (a la Prolog), much of the database logic can be moved closer to the application. In his case, with Clojure's data structures.

Maybe the the problem with CI is that it's over there. As soon as it stops being something that I could set up and run quickly on my laptop over and over, the frog is already boiled.

The comparison to build systems is apt. I can and occasionally do build the database that I work on locally on my laptop without any remote caching. It takes a very long time, but not too long, and it doesn't fail with the error "people who maintain this system haven't tried this."

The CI system, forget it.

Part of the problem, maybe the whole problem, is that we could get it all working and portable and optimized for non-blessed environments, but it still will only be expected to work over there, and so the frog keeps boiling.

I bet it's not an easy problem to solve. Today's grand unified solution might be tomorrow's legacy tar pit. But that's just software.

reactordev - 11 hours ago
The rule for CI/CD and DevOps in general is boil your entire build process down to one line:
```
    ./build.sh
```
If you want to ship containers somewhere, do it in your build script where you check to see if you’re running in “CI”. No fancy pants workflow yamls to vendor lock yourself into whatever CI platform you’re using today, or tomorrow. Just checkout, build w/ params, point your coverage checker at it.
This is also the same for onboarding new hires. They should be able to checkout, and build, no issues or caveats, setup for local environment. This ensures they are ready to PR by end of the day.
(Fmr Director of DevOps for a Fortune 500)
- maratc - 5 hours ago
  
  Yeah, that's a good rule. Except, do you want to build Debug or Release? Or maybe RelWithDebugInfo? And do you want that with sanitizers maybe? And what the sanitizers' options should be? Do you want to compile your tests too, if you want to run them later on a different machine? And what about that dependency that takes two hours to compile, maybe you just want to reuse the previous compilation of it? And if so, where to take that from? Etc. etc.
  Before long, you need another script that will output the train of options to your `build.sh`.
  (If Fortune 500 companies can do a one-line build with zero parameters, I suspect I'd be very bored there.)
  - reactordev - 4 hours ago
    
    Of course we had parameters but we never ship debug builds. Treat everything like production.
    If you want to debug, docker compose or add logs and metrics to seek what you find.
- pxc - 11 hours ago
  
  You still inevitably need a bunch of CI platform-specific bullshit for determining "is this a pull request? which branch am I running on?", etc. Depending on what you're trying to do and what tools you're working with, you may need such logic both in an accursed YAML DSL and in your build script.
  And if you want your CI jobs to do things like report cute little statuses, integrate with your source forge's static analysis results viewer, or block PRs, you have to integrate with the forge at a deeper level.
  There aren't good tools today for translating between the environment variables or other things that various CI platforms expose, managing secrets (if you use CI to deploy things) that are exposed in platform-specific ways, etc.
  If all you're doing with CI is spitting out some binaries, sure, I guess. But if you actually ask developers what they want out of CI, it's typically more than that.
  - michaelmior - 11 hours ago
    
    A lot of CI platforms (such as GitHub) spit out a lot of environment variables automatically that can help you with the logic in your build script. If they don't, they should give you a way to set them. One approach is to keep the majority of the logic in your build script and just use the platform-specific stuff to configure the environment for the build script.
    Of course, as you mention, if you want to do things like comment on PRs or report detailed status information, you have to dig deeper.
    
    pxc - 10 hours ago
    
    Yes, and real portability for working with the environment variables is doable but there's nothing out there that provides it for you afaik. You just have to read a lot carefully.
    My team offers integrations of static analysis tools and inventorying tools (SBOM generation + CVE scanning) to other teams at my organization, primarily for appsec purposes. Our organization's departments have a high degree of autonomy, and tooling varies a lot. We have code hosted in GitLab, GitHub, Azure DevOps, and in distant corners my team has not yet worked with, elsewhere. Teams we've worked with run their CI in GitLab, GitHub, Azure DevOps, AWS CodeBuild, and Jenkins. Actual runners teams use may be SaaS-provided by the CI platform, or self-hosted on AWS or Azure. In addition to running in CI, we provide the same tools locally, for use on macOS as well as Linux via WSL.
    The tools my team uses for these scans are common open-source tools, and we distribute them via Nix (and sometimes Docker). That saves us a lot of headaches. But every team has their own workflow preferences and UI needs, and we have to meet them on the platforms they already use. For now we manage it ourselves, and it's not too terrible. But if there were something that actually abstracted away boring but occasionally messy differences like which environment variables mean in different CI systems, that would be really valuable for us. (The same goes for even comment bots and PR management tools. GitHub and GitLab are popular, but Azure DevOps is deservedly marginal, so even general-purpose tools rarely support both Azure DevOps and other forges.)
    If your concern is that one day, a few years from now, you'll need to migrate from one forge to another, maybe you can say "my bash script handles all the real build logic" and get away with writing off all the things it doesn't cover. Maybe you spend a few days or even a few weeks rewriting some platform-specific logic when that time comes and forget about it. But when you're actually contending with many such systems at once, you end up wishing for sane abstractions or crafting them yourself.
    
    merb - 10 hours ago
    
    how can you build your containers in parallel?
    over multiple machines? I'm not sure that a sh script can do that with github
    
    pxc - 10 hours ago
    
    If you build them with Nix, you can. Just call `nix build` with a trailing `&` a bunch of times.
    But it's kind of cheating, because the Nix daemon actually handles per-machine scheduling and cross-machine orchestration for you.
    Just set up some self-hosted runners with Nix and an appropriately configured remote builders configuration to get started.
    If you really want to, you can graduate after that to a Kubernetes cluster where Nix is available on the nodes. Pass the Nix daemon socket through to your rootless containers, and you'll get caching in the Nix store for free even with your ephemeral containers. But you probably don't need all that anyway. Just buy or rent a big build server. Nix will use as many cores as you have by default. It will be a long time before you can't easily buy or rent a build server big enough.
  - oftenwrong - 2 hours ago
    
    There are some very basic tools that can help with portability, such as https://github.com/milesj/rust-cicd-env , but I agree that there is a lot of proprietary, vendor-specific, valuable functionality available in the average "CI" system that you cannot make effective use of with this approach. Still, it's the approach I generally favor for a number of reasons.
- mrbombastic - 10 hours ago
  
  How do you get caching of build steps with this approach? Or do you just not?
  - arianvanp - 10 hours ago
    
    Use a modern hermetic build system with remote caching or remote execution. Nix, Bazel, buck, pants. Many options
    
    pwnna - 10 hours ago
    
    This is like fighting complexity with even more complexity. Nix and bazel are definitely not close to actually achieving hermetic build at scale. And when they break the complexity increases exponentially to fix.
    
    pxc - an hour ago
    
    What's not hermetic with Nix? Are you talking about running with the sandbox disabled, or and macOS quirks? It's pretty damn hard to accidentally depend on the underlying system in an unexpected way with Nix.
  - fireflash38 - 10 hours ago
    
    Even just makefiles have 'caching', provided you set dependencies and output correctly.
    A good makefile is really nice to use. Not nice to read or trace unfortunately though.
- XorNot - 11 hours ago
  
  The other rule is that script should run as a user. Solely on that working directory.
  There are too many scripts like that which start, ask for sudo and then it's off to implementing someones "great idea" about your systems network interfaces.
  - reactordev - 6 hours ago
    
    sudo should not be required to build software.
    If there’s something you require that requires sudo, it’s a pre-build environment setup on your machine. On the host. Or wherever. It’s not part of the build. If you need credentials, get them from secrets or environment variables.
- HPsquared - 11 hours ago
  
  Sounds like the Lotus philosophy, "simplify and add lightness".
- marsven_422 - 11 hours ago
  
  [dead]
DrBazza - 15 hours ago
Your build should be this:
```
    build.bash <debug|release>
```
and that's it (and that can even trigger a container build).
I've spent far too much time debugging CI builds that work differently to a local build, and it's always because of extra nonsense added to the CI server somehow. I've yet to find a build in my industry that doesn't yield to this 'pattern'.
Your environment setup should work equally on a local machine or a CI/CD server, or your devops teams has identically set it up on bare metal using Ansible or something.
- nrclark - 14 hours ago
  
  Agreed with this sentiment, but with one minor modification: use a Makefile instead. Recipes are still chunks of shell, and they don’t need to produce or consume any files if you want to keep it all task-based. You get tab-completion, parallelism, a DAG, and the ability to start anywhere on the task graph that you want.
  It’s possible to do all of this with a pure shell script, but then you’re probably reimplementing some or all of the list above.
  - gchamonlive - 13 hours ago
    
    Just be aware of the "Makefile effect"[1] which can easily devolve into the Makefile also being "over there", far from the application, just because it's actually a patchwork of copy-paste targets stitched together.
    [1] https://news.ycombinator.com/item?id=42663231
  - DrBazza - 12 hours ago
    
    > use a Makefile instead
    I was making a general comment that your build should be a single 'command'. Personally, I don't care what the command is, only that it should be a) one command, and b) 100% runnable on a dev box or a server. If you use make, you'll soon end up writing... shell scripts, so just use a shell script.
    In an ideal world your topmost command would be a build tool:
    ./gradlew build bazel build //... make debug cmake --workflow --preset
    Unfortunately, the second you do that ^^^, someone edits your CI/CD to add a step before the build starts. It's what people do :(
    All the cruft that ends up *in CI config*, should be under version control, and inside your single command, so you can debug locally.
    
    chubot - 11 hours ago
    
    That's exactly why the "main" should be shell, not make (see my sibling reply). So when someone needs to add that step, it becomes:
    #!/bin/sh step-I-added-to-shell-rather-than-CI-yaml make debug # or cmake, bazel
    This is better so you can run the whole thing locally, and on different CI providers
    In general, a CI is not a DAG, and not completely parallel -- but it often contains DAGs
  - chubot - 12 hours ago
    
    Make is not a general purpose parallel DAG engine. It works well enough for small C projects and similar, but for problems of even medium complexity, it falls down HARD
    Many years ago, I wrote 3 makefiles from scratch as an exploration of this (and I still use them). I described the issues here: https://lobste.rs/s/yd7mzj/developing_our_position_on_ai#c_s...
    ---
    The better style is in a sibling reply -- invoke Make from shell, WHEN you have a problem that fits Make.
    That is, the "main" should be shell, not Make. (And it's easy to write a dispatcher to different shell functions, with "$@", sometimes called a "task file" )
    In general, a project's CI does not fit entirely into Make. For example, the CI for https://oils.pub/ is 4K lines of shell, and minimal YAML (portable to Github Actions and sourcehut).
    https://oils.pub/release/latest/pub/metrics.wwz/line-counts/...
    It invokes Make in a couple places, but I plan to get rid of all the Make in favor of Python/Ninja.
    
    DrBazza - 11 hours ago
    
    Portability to other CI/CDs systems is an understated reason to use a single build command.
  - dgfitz - 13 hours ago
    
    You invoke CMake/qmake/configure/whatever from the bash script.
    I hate committing makefiles directly if it can be helped.
    You can still call make in the script after generating the makefile, and even pass the make target as an argument to the bash script if you want. That being said, if you’re passing more than 2-3 arguments to the build.sh you’re probably doing it wrong.
    
    nrclark - 13 hours ago
    
    Yes to calling CMake/etc. No to checking in generated Makefiles. But for your top-level “thing that calls CMake”, try writing a Makefile instead of a shell script. You’ll be surprised at how powerful it is. Make is a dark horse.
    
    dgfitz - 11 hours ago
    
    I wouldn't be surprised at all, make is great!
    My contention is that a build script should ideally be:
    sha-bang
    clone && cd $cloned_folder
    ${generate_makefile_with_tool}
    make $1
    Anything much longer than that can (and usually will) quickly spiral out of control.
    Make is great. Unless you're code-golfing, your makefile will be longer than a few lines and a bunch of well-intentioned-gremlins will pop in and bugger the whole thing up. Just seen it too many times.
    Edit: in the jenkins case, in a jenkins build shell the clone happens outside build.sh:
    (in jenkins shell):
    clone && cd clone ./build.sh $(0-1 args)
    (inside build.sh): $(generate_makefile_with_tool) make $1
- mikepurvis - 12 hours ago
  
  There are various things that can be a reasonable candidate for the "top level" build entrypoint, including Nix, bazel, docker bake, and probably more I'm not thinking of. They all have an entrypoint that doesn't have a ton of flags or nonsense, and operate in a pretty self contained environment that they set up and manage themselves.
  Overall I'm not a fan of wrapping things; if there are flags or options on the top-level build tool, I'd rather my devs explore those and get used to what they are and can do, rather than being reliant on a project-specific script or make target to just magically do the thing.
  Anyway, other than calling the build tool, CI config can have other steps in it, but it should be mostly consumed with CI-specific add-ons, like auth (OIDC handshake), capturing logs, uploading artifacts, sending a slack notification, whatever it is.
  - DrBazza - 11 hours ago
    
    Fortunately most CI/CD systems expose an environment variable during the build so you can detect most of those situations and still write a script that runs locally on a developer box.
    Our wrapping is 'minimal', in that you can still run
    bazel build //...
    or
    cmake ...
    and get the same build artefacts as running:
    build.bash release
    My current company is fanatical about read-only for just about every system we have (a bit like Nix, I suppose), and that includes CI/CD. Once the build is defined to run debug or release, rights are removed so the only thing you can edit are the build scripts you have under your control in your repo. This works extremely well for us.
    
    mikepurvis - 8 hours ago
    
    Interestingly despite being pretty hard-nosed about a lot of things, Nix does not insist on a read-only source directory at build time— the source is pulled into a read-only store path, but from there it is copied into the build sandbox, not bind-mounted.
    I expect this is largely a concession to the reality that most autotools projects still expect an in-source build, not to mention Python wanting to spray pyc files and build/dist directories all over the place.
- kqr - 11 hours ago
  
  I tried to drive this approach at a previous job but nobody else on the team cared so I ended up always having to mirror all the latest build changes into my bash script.
  The reason it didn't catch on? Everyone else was running local builds in a proprietary IDE, so to them the local build was never the same anyway.
- germandiago - 11 hours ago
  
  I always use, no matter what I am using underneath, a bootstrap script, a configure script and a build step.
  That keeps the cli interface easy, expectable and guessable.
KronisLV - 17 hours ago

> Part of the problem, maybe the whole problem, is that we could get it all working and portable and optimized for non-blessed environments, but it still will only be expected to work over there, and so the frog keeps boiling.
Build the software inside of containers (or VMs, I guess): a fresh environment for every build, any caches or previous build artefacts explicitly mounted.
Then, have something like this, so those builds can also be done locally: https://docs.drone.io/quickstart/cli/
Then you can stack as many turtles as you need - such as having build scripts that get executed as a part of your container build, having Maven or whatever else you need inside of there.
It can be surprisingly sane: your CI server doing the equivalent of "docker build -t my_image ..." and then doing something with it, whereas during build time there's just a build.sh script inside.
- kqr - 12 hours ago
  
  This sounds a lot like "use Nix".
  - justinrubek - 12 hours ago
    
    Unfortunately, that's the last thing a lot of people want to hear, despite it saving a whole lot of heartache.
    
    zhengyi13 - 11 hours ago
    
    I mean, sure (also bazel I think), but I feel like that's because the learning curve for these tools to a first approximation looks a bit like the infamous EvE Online learning curve[0].
    [0]: https://imgur.com/gallery/eve-online-learning-curve-jj16ThL
dapperdrake - 16 hours ago

Transactions and a single consistent source of truth with stuff like observability and temporal ordering is centralized and therefore "over there" for almost every place you could be in.
As long as communications have bounded speed (speed of light or whatever else) there will be event horizons.
The point of a database is to track changes and therefore time centrally. Not because we want to, but because everything else has failed miserably. Even conflicting CRDT change merges and git merges can get really hairy really quickly.
People reinvent databases about every 10 years. Hardware gets faster. Just enjoy the show.
- MathMonkeyMan - 11 hours ago
  
  I haven't used Datomic, but you're right that the part that requires over there is "single consistent source of truth." There's only ever a single node that is sequencing all writes. Perhaps as a result of this, it provides strong [verified ACID guarantees][1].
  What I got from Hickey's talk is that he wanted to design a system that resisted the urge to encode everything in a stored procedure and run it on the database server.
  [1]: https://jepsen.io/analyses/datomic-pro-1.0.7075
MortyWaves - 16 hours ago

It’s why I’ve started making CI simply a script that I can run locally or on GitHub Actions etc.
Then the CI just becomes a bit of yaml that runs my script.
- maccard - 15 hours ago
  
  How does that script handle pushing to ghcr, or pulling an artifact from a previous stage for testing?
  In my experience these are the bits that fail all the time, and are the most important parts of CI once you go beyond it taking 20/30 seconds to build.
  A clean build in an ephemeral VM of my project would take about 6 hours on a 16 core machine with 64GB RAM.
  - thechao - 15 hours ago
    
    Sheesh. I've got a multimillion line modern C++ protect that consists of a large number of dylibs and a few hundred delivered apps. A completely cache-free build is an only few minutes. Incremental and clean (cached) builds are seconds, or hundreds of milliseconds.
    It sounds like you've got hundreds of millions of lines of code! (Maybe a billion!?) How do you manage that?
    
    maccard - 15 hours ago
    
    It’s a few million lines of c++ combined with content pipelines. Shader compilation is expensive and the tooling is horrible.
    Our cached builds on CI are 20 minutes from submit to running on steam which is ok. We also build with MSVC so none of the normal ccache stuff works for us, which is super frustrating
    
    thechao - 14 hours ago
    
    Fuck. I write shader compilers.
    
    maccard - 12 hours ago
    
    Eh, you write them I (ab)use them.
    
    bluGill - 10 hours ago
    
    I have 15 million lines of C++, and builds are several hours. We split into multi-repo (for other reasons) and that helps because compiling is memory bandwidth limited - on the CI system by we can split the different repos to different CI nodes.
  - MortyWaves - 14 hours ago
    
    To be honest I haven’t really thought about it and it’s definitely something it can’t do, you’d probably need to call their APIs or something.
    I am fortunate in that the only things I want to reuse is package manager caches.
- maratc - 5 hours ago
  
  You must be very lucky to be in a position where you know what needs to be done before the run begins. Not everyone is in that position.
  At my place, we have ~400 wall hours of testing, and my run begins by figuring out what tests should be running and what can be skipped. This depends on many factors, and the calculation of the plan already involves talking to many external systems. Once we have figured out a plan for the tests, we can understand the plan for the build. Only then we can build, and test afterwards. I haven't been able to express all of that in "a bit of yaml" so far.
- j4coh - 15 hours ago
  
  Are you not worried about parallelisation in your case? Or have you solved that in another way (one big beefy build machine maybe?)
  - MortyWaves - 14 hours ago
    
    Honestly not really… sure it might not be as fast but the ability to know I can debug it and build it exactly the same way locally is worth the performance hit. It probably helps I don’t write C++, so builds are not a multi day event!
layer8 - 11 hours ago

Yes, the build system should be independent from the platform that hosts it. Having GitHub or GitLab execute your build is fine, but you should as easily be able to execute it locally on your own infrastructure. The definition of the build or integration should be independent from that, and the software that ingests and executes such definitions shouldn’t be a proprietary SaaS.
AtlasBarfed - 17 hours ago

I want my build system to be totally declarative
Oh the DSL doesn't support what I need it to do.
Can I just have some templating or a little bit of places to put in custom scripts?
Congratulations! You now have a turing complete system. And yes, per the article that means you can cryptocurrency mine.
Ansible terraform Maven Gradle.
Unfortunate fact is that these IT domains (builds and CI) are at a junction of two famous very slippery slopes.
1) configuration
2) workflows
These two slippery slopes are famous for their demos of how clean and simple they are and how easy it is to do. Anything you need it to do.
In the demo.
And sure it might stay like that for a little bit.
But inevitably.... Script soup
- lelanthran - 17 hours ago
  
  Alternative take: CI is the successful monetization of Make-as-a-Service.

mettamage - 17 hours ago

IMO development is too complex and misdirected in general since we cargo cult FAANG.

Need AWS, Azure or GCP deployment? Ever thought about putting it on bare metal yourself? If not, why not? Because it's not best practice? Nonsense. The answer with these things is: it depends, and if your app has not that many users, you can get away with it, especially if it's a B2B or internal app.

It's also too US centric. The idea of scalability applies less to most other countries.

taminka - 16 hours ago

many ppl also underestimate how capable modern hardware is: for ~10usd you could handle like a million concurrent connections with a redis cluster on a handful of VPSs...
- zahlman - 15 hours ago
  
  Relevant: Program Your Own Computer in Python (https://www.youtube.com/watch?v=ucWdfZoxsYo) from this year's PyCon, emphasizing how much you can accomplish with local execution and how much overhead can be involved in doing it remotely.
- merb - 10 hours ago
  
  many ppl also understimate how complex it is to satisfy uptime requirements, how to scale out local infrastructure when storage > 10/50/100tb (yeah a single disk can handle that, but what about bit rot, raid stuff, etc) is involved.
  it gets worse when you need more servers because your ocr process of course needs cpu x so on a beefiy machine you can handle maybe 50 high page documents. but how do you talk to other machines, etc.
  also humans costs way more money than cloud stuff. I the cloud stuff can be managed in like 1 day per month you dont need a real person, if you have real hardware that day is not enough and you soon need a dedicated person, keeping everything up-to-date, etc.
  - rcxdude - 10 hours ago
    
    >also humans costs way more money than cloud stuff. I the cloud stuff can be managed in like 1 day per month you dont need a real person, if you have real hardware that day is not enough and you soon need a dedicated person, keeping everything up-to-date, etc.
    In my experience, I have observed the opposite: companies with on-site infrastructure have been able to manage it in the spare time of a relatively small team (especially since hardware is pretty powerful and reliable nowadays), while those with cloud infrastruture have a large team focused on just maintaining the system, because cloud pushes you into far more complex setups.
    
    merb - 9 hours ago
    
    most of the time the "far more complex setup" is mostly easier than the reimplementation of kubernetes with ansible.
  - - 8 hours ago
    
    [deleted]
- reactordev - 16 hours ago
  
  One beelink in a closet runs our entire OP’s cluster.
- dapperdrake - 16 hours ago
  
  This
franga2000 - 16 hours ago

Requirements are complex too. Even if you don't need to scale at all, you likely do need zero-downtime deployment, easy rollbacks, server fault tolerance, service isolation... If you put your apps into containers and throw them onto Kubernetes, you get a lot of that "for free" and in a well-known and well-tested way. Hand-rolling even one of those things, let alone all of them together, would take far too much effort.
- mettamage - 14 hours ago
  
  > you likely do need zero-downtime deployment
  I know SaaS businesses that don't as they operate in a single country, within a single timezone and the availability needs to be during business days and business hours.
  > easy rollbacks
  Yea, I haven't seen exceptions at all on this. So yea.
  > server fault tolerance
  That really depends. Many B2B or internal apps are fine with a few hours, or even a day, of downtime.
  > service isolation
  Many companies just have one app and if it's a monolith, then perhaps not.
  > Hand-rolling even one of those things
  Wow, I see what you're trying to say and I agree. But it really comes across as "if you don't use something like Kubernetes you need to handroll these things yourself." And that's definitely not true. But yea, I don't think that's what you meant to say.
  Again, it depends
  - kqr - 12 hours ago
    
    > I know SaaS businesses that don't as they operate in a single country, within a single timezone and the availability needs to be during business days and business hours.
    This is a bad road to go down. Management will understand the implication that it's okay to reduce reliability requirements because "we'll just do the dangerous things on the weekends!"
    After some time, developers are scheduled every other weekend and when something breaks during daytime, it's not going to be a smooth process to get it up again, because the process has always been exercised with 48 hours to spare.
    Then at some point it's "Can we deploy the new version this weekend?" "No, our $important_customer have their yearly reporting next week, and then we have that important sales demo, so we'll hold off another month on the deployment." You get further and further away from continuous integration.
  - franga2000 - 14 hours ago
    
    I'm definitely curious about alternatives for getting these features without k8s. Frankly, I don't like it, but I use it because it's the easiest way I've found to get all of these features. Every deployment I've seen that didn't use containers and something like k8s either didn't have a lot of these features, implemented them with a bespoke pile of shell scripts, or a mix of both.
    For context, I work in exactly that kind of "everyone in one time zone" situation and none of our customers would be losing thousands by the minute if something went down for a few hours or even a day. But I still like all the benefits of a "modern devops" approach because they don't really cost much at all and it means if I screw something up, I don't have to spend too much time unscrewing it. It took a bit more time to set up compared to a basic debian server, but then again, I was only learning it at the time and I've seen friends spin up fully production-grade Kubernetes clusters in minutes. The compute costs are also negligible in the grand scheme of things.
    
    stonemetal12 - 12 hours ago
    
    >I use it because it's the easiest way I've found to get all of these features. Every deployment I've seen that didn't use containers and something like k8s either didn't have a lot of these features, implemented them with a bespoke pile of shell scripts, or a mix of both.
    Features aren't pokemon you don't have to catch them all.
    Back when stackoverflow was cool and they talked about their infrastructure, they were running the whole site at 5 9s on 10-20 boxes. For a setup like that k8s would have A) required more hardware B) a complete rewrite of their system to k8sify it C) delivered no additional value.
    k8s does good things if you have multiple datacenters worth of hardware to manage, for everyone else it adds overhead for features you don't really need.
    
    franga2000 - 12 hours ago
    
    A) Not much more. The per-node overhead is relatively small and it's not unlikely that they could have made some efficiency gains by having a homogenous cluster that saved them some nodes to offset that.
    B) Why on earth would you need to do that? K8s is, at its core, just a thing that runs containers. Take your existing app, stick it in a container and write a little yaml explaining which other containers it connects to. It can do many other things, but just...don't use them?
    C) The value is in not having to develop orchestration in house. They already had it so yea, I wouldn't say "throw it out and go to k8s", but if you're starting from scratch and considering between "write and maintain a bunch of bespoke deployment scripts" and "just spin up Talos, write a few yaml files and call it a day" I think the latter is quite compelling.
- s_Hogg - 15 hours ago
  
  Holy shit you don't get anything for _free_ as a result of adopting Kubernetes dude. The cost is in fact quite high in many cases - you adopt Kubernetes and all of the associated idiosyncrasies, which can be a lot more than what you left behind.
  - franga2000 - 15 hours ago
    
    For free as in "don't have to do anything to make those features, they're included".
    What costs are you talking about? Packaging your app in a container is already quite common so if you already do that all you need to do is replace your existing yaml with a slightly different yaml.
    If you don't do that already, it's not really that difficult. Just copy-paste your your install script or rewrite your Ansible playbooks into a Dockerfile. Enjoy the free security boost as well.
    What are the other costs? Maintaining something like Talos is actually less work than a normal Linux distro. You already hopefully have a git repo and CI for testing and QA, so adding a "build and push a container" step is a simple one-time change. What am I missing here?
- dapperdrake - 16 hours ago
  
  Unix filesystem inodes and file descriptors stick around until they are closed, even if the inode has been unlinked from a directory. The latter is usually called "deleting the file".
  All the stuff Erlang does.
  Static linking and chroot.
  The problems and the concepts and solutions have been around for a long time.
  Piles and piles of untold complexity, missing injectivity on data in the name of (leaky) abstractions and cargo-culting have been with us on the human side if things for even longer.
  And as always: technical and social problems may not always benefit from the same solutions.
  - franga2000 - 16 hours ago
    
    Ok so let's say you statically link your entire project. There are many reasons you shouldn't or couldn't, but let's say you do. How do you deploy it to the server? Rsync, sure. How do you run it? Let's say a service manager like systemd. Can you start a new instance while the old one is running? Not really, you'll need to add some bash script glue. Then you need a loadbalancer to poll the readiness of the new one and shift the load. What if the new instance doesn't work right? You need to watch for that, presumably with another bash script, stop it and keep the old one as "primary". Also, you'll need to write some selinux rules to make it so if someone exploits one service, they can't pivot to others.
    Congrats, you've just rewritten half of kubernetes in bash. This isn't reducing complexity, it's NIH syndrome. You've recreated it, but in a way that nobody else can understand or maintain.
    
    dapperdrake - 12 hours ago
    
    Now I see how it could have been confusing to read.
    Cannot edit anymore so amending here:
    Static liking and chroot (not as The One True Solution (TM)) but as basically Docker without Linux network namespaces.
    Linux/Docker actually wound up improving things here. And they got to spend all the money on convincing the people that like advertisements.
    And static linking mainly only becomes relevant (and then irrelevant again) in C because if boundaries between compilation units. SQLite throws all of this out. They call it an amalgamation (which also sounds better than a "unity build").
    The tools are there. They are just overused. Look at enterprise Hello World in Java for a good laugh.
    ————
    If your data lives in a database on another end if a unix or TCP socket, then I still don't see "NIH". The new binary self-tests and the old binary waits for a shutdown command record and drains its connections.
    Kernels and databases clock in at over 5M lines of code. NIH seems like missing the point there.
    And most services neither need nor have nine nines of uptime. That is usually too expensive. And always bespoke. Must be tailored to the available hardware.
    Code is less portable than people believe.
    Ten #ifdef directives and you are often dead on arrival.
- 9 hours ago

[deleted]

sluongng - 15 hours ago

The most concerning part about modern CI to me is how most of it is running on GitHub Actions, and how GitHub itself has been deprioritizing GitHub Actions maintenance and improvements over AI features.

Seriously, take a look at their pinned repo: https://github.com/actions/starter-workflows

> Thank you for your interest in this GitHub repo, however, right now we are not taking contributions.

> We continue to focus our resources on strategic areas that help our customers be successful while making developers' lives easier. While GitHub Actions remains a key part of this vision, we are allocating resources towards other areas of Actions and are not taking contributions to this repository at this time.

wink - 13 hours ago

The last time the company I worked for was hosting code on Github, Actions did not exist yet and for personal stuff copying some 3 liners was fine, I'd hardly call that "using".
"Github Actions might be over, so not worth engaging" was not on my bingo card.
captn3m0 - 14 hours ago

They are instead focusing on Agentic Workflows which used natural language instead of YAML.
https://github.com/githubnext/gh-aw
- kstrauser - 12 hours ago
  
  Know what I love in a good build system? Nondeterminism! Who needs coffee when you can get your thrills from stochastic processes. Why settle for just non-repeatable builds when you can have non-repeatable build failures!
  - joelfried - 11 hours ago
    
    Would a smart AI accept such foolishness? I doubt it. It'll still use something deterministic under the hood - it'll just have a conversational abstraction layer for talking to the Product person writing up requirements.
    We used to have to be able to communicate with other humans to build something. It seems to me that's what they're trying to take out of the loop by doing the things that humans do: talk to other humans and give them what they're asking for.
    I too am not a fan of the dystopias we're ending up in.
    
    bluGill - 10 hours ago
    
    Would it, or would it rewrite / refactor the logic every time. I'd expect the logic to remain as it for months, but then change suddenly without warning when the AI is upgraded.
    
    kstrauser - 10 hours ago
    
    “Just make it generate YAML and cache that until the prompt changes!”
    Orrrrr… just keep that YAML as the sole configuration input in the first place. Use AI to write it if you wish, but then leave it alone.
  - dehrmann - 10 hours ago
    
    What I'm hearing is we need to invent LLM-based compilers.
    
    alienbaby - 10 hours ago
    
    It's just translation right? Llm's are pretty good at that..
    
    kstrauser - 10 hours ago
    
    Time to launch LLMLLVM.
- woodruffw - 12 hours ago
  
  I personally find this pretty concerning: GitHub Actions already has a complex and opaque security model, and adding LLMs into the mix seems like a perfect way to keep up the recent streak of major compromises driven by vulnerable workflows and actions.
  I would hope that this comes with major changes to GHA’s permissions system, but I’m not holding my breath for that.

pointlessone - 13 hours ago

I can’t say I like OP’s vision. My main objection is that this vision is terminally online. I want to be able to run the whole build locally (for when my internet is down, or I’m on a plane, or on a remote island in a cave, etc.). The local build and CI should only differ in that local build is triggered manually and results are reported in the terminal (or IDE) and CI build is triggered by a push and reported on the PR (or other web page, or API endpoint, etc. ). It should be the same but for the entry and exit. Tasks, queues, DAGs, etc. it’s all nice but ultimately are implementation details. Even make has DAGs, tasks, and parallel execution. Unless the build can run locally it’s as if there’s no build. Differences between local build and CI, be it because of environment, tasks setup, caching, whatever makes CI painful. It’s precisely because you have a build system for local builds and a separate CI setup that the world contains 10% more misery than it should.

So basically either the whole CI pipeline is just a single command invoking my build system or the CI pipeline can be ran locally. Any other arrangement is self-inflicted suffering.

carom - 38 minutes ago

I think you have perfectly described OP's vision. A unification of build systems and CI pipelines.
bluGill - 10 hours ago

I want my CI system to track build numbers. When I build locally I don't care about build numbers 99% of the time. There are a number of other things my CI does that I should be able to do locally, but realistically I don't care and so I want to do something different.

sambuccid - 16 hours ago

I'm not sure why no one mentioned it yet, but the CI tool of sourcehut (https://man.sr.ht/builds.sr.ht/) simplifies all of this. It just spins a linux distro of your choice, and executes a very bare bone yml that essentially contains a lot of shell commands, so it's also easy to replicate locally.

There are 12 yml keywords in total that cover everything.

Other cool things are the ability to ssh in a build if it failed(for debugging), and to run a one-time build with a custom yml without committing it(for testing).

I believe it can checkout any repository, not just one in sourcehut that triggers a build, and that has also a GraphQL API

chubot - 12 hours ago

BTW if you follow the philosophy of using bash as your CI so it runs locally (mentioned by several people in this thread), then you can use the same CI logic on sourcehut and Github Actions.
Both of them provide VMs where you can run anything, and bash is of course there on every image.
We do that for https://oils.pub/
sourcehut yaml: https://github.com/oils-for-unix/oils/tree/master/.builds
github yaml: https://github.com/oils-for-unix/oils/tree/master/.github/wo...
They both call the same shell. The differences are:
* We use Github's API to merge on green; right now we don't have the same for sourcehut (since Github is the primary repo)
* Github Actions provides way more resources. They are kind of "locking projects in" by being free.
This post on NixOS gives a hint of htat
https://blog.erethon.com/blog/2025/07/31/how-nixos-is-built/
The monthly cost for all the actions in July of 2025 came out to a bit over 14500 USD which GitHub covers in its entirety.
So I think many projects are gradually sucked in to Github because it is indeed quite generous (including us, which annoys me -- we run more tasks on Github than sourcehut, even though in theory we could run all on sourcehut)
---
BUT I think it is a good idea to gradually consolidate your logic into shell, so you can move off Github in the future. Open source projects tend to last longer than cloud services.
This already happened to us -- we started using Travis CI in 2018 or so, and by 2021, it was acquired and the free tier was removed
maratc - 14 hours ago

Everything I've seen that's based on yaml makes easy things trivial, and hard things impossible.
This caused me to default back to Jenkins several times already, now I'm in a position to never wander off to another yaml-based tool.
mrweasel - 10 hours ago

Sourcehuts build.sr.ht is the best CI system I've used. I really want to give it a go at work as a replacement for our existing Jenkins solution, and I don't even thing that Jenkins is that bad.
Previously I've argued that CI/CD systems need two things, the ability to run bash and secrets management. Today I'd add: The ability to spin up an isolated environment for running the bash script.
duped - 14 hours ago

A big reason people use actions is because they need to run things on MacOS and Windows.

k3vinw - 17 hours ago

This speaks to me. Lately, I’ve encountered more and more anti patterns where the project’s build system was bucked in favor of using something else. Like having a maven project and instead of following the declarative convention defining profiles and goals, everything was a hodge podge of shell scripts that only the Jenkins pipeline knew how to stitch together. Or a more recent case where the offending project had essential build functionality embedded in a Jenkins pipeline so you have to reverse engineer what it’s doing just so you can execute the build steps from your local machine. A particularly heinous predicament as the project depends on the execution of the pipeline to provide basic feedback.

Putting too much responsibility in the ci environment makes life as a developer (or anyone responsible for maintaining the ci process) more difficult. It’s far more superior to have a consistent use of the build system that can be executed the same way on your local machine as it is in your ci environment. I suppose this is the mess you find yourself in when you have other teams building your pipelines for you?

tacker2000 - 19 hours ago

These online / paid CI systems are a dime a dozen and who knows what will happen to them in the future…

Im still rocking my good old jenkins machine, which to be fair took me a long time to set up, but has been rock solid ever since and will never cost me much and will never be shut down.

But i can definitely see the appeal of github actions, etc…

dgfitz - 17 hours ago

+1 for Jenkins.
At $dayjob they recently set up git runners. The effort I’m currently working on has the OS dictated to us, long story don’t ask. The OS is centos 7.
The runners do not support this. There is an effort to move to Ubuntu 22.04. The runners also don’t support this.
I’m setting up a Jenkins instance.
thrown-0825 - 19 hours ago

until you have to debug a GH action, especially if it only runs on main or is one of the handful of tasks that are only picked up when committed to main.
god help you, and don’t even bother with the local emulators / mocks.
- myaccountonhn - 18 hours ago
  
  Sourcehut builds is so much better. You can actually ssh into the machine and debug it directly.
  - whstl - 17 hours ago
    
    There is a community action for doing so in Github too, but god knows if it's secure or works as well as Sourcehut.
    https://github.com/marketplace/actions/debugging-with-ssh
- zokier - 15 hours ago
  
  But debugging Jenkins jobs is absolute pain too, in varying ways depending on how the job was defined (clicking through the ui, henerated by something, groovy, pipelines, etc).
  - tacker2000 - 14 hours ago
    
    Yea, thats really a pain and could be improved.
    Are there any Jenkins Gurus out there who can give some tips?
- OtherShrezzing - 16 hours ago
  
  What are the good local emulators for gh actions? The #1 reason we don’t use them is because the development loop is appallingly slow.
  - homebrewer - 15 hours ago
    
    nektos/act was considered good enough to be adopted as the CI solution for Gitea and Forgejo. The latter uses it for all their development, seems to work out fine for them.
    I've never been a fan of GitHub Actions (too locked-in/proprietary for my taste), so no idea if it lives up to expectations.
  - thrown-0825 - 15 hours ago
    
    none of them are good ime, stopped using actions for the same reason
- bubblyworld - 18 hours ago
  
  I've had a great experience using `act` to debug github actions containers. I guess your mileage, as usual, will vary depending on what you are doing in CI.
  - thrown-0825 - 18 hours ago
    
    i tried act a couple years ago and ran into a lot of issues when running actions that have external dependencies

ThierryAbalea - 8 hours ago

I agree with the author that CI and build systems are really trying to solve the same core problem: efficient execution of a dependency graph. And I share the view that modern CI stacks often lack the solid foundations that tools like Bazel, Gradle, or Nx bring to build systems.

Where I differ a bit is on the "two DAGs" criticism. In practice the granularity isn’t the same: the build system encodes how to compile and test, while the CI level is more about orchestration, cloning the repo, invoking the build system, publishing artifacts. That separation is useful, though we do lose the benefits of a single unified DAG for efficiency and troubleshooting.

The bigger pain points I hear from developers are less about abstractions and more about day-to-day experience: slow performance, flakiness, lack of visibility, and painful troubleshooting. For example, GitHub Actions doesn’t let you test or debug pipelines locally, you have to push every change to the remote. The hosted runners are also underpowered, and while self-hosting sounds attractive, it quickly becomes a time sink to manage reliably at scale.

This frustration is what led me to start working on Shipfox.io. Not a new CI platform, but an attempt to fix these issues on top of GitHub Actions. We’re focused on faster runners and better visibility, aggregating CI logs, test logs, CPU and memory profiles to make failures and performance problems easier to debug.

donatj - 15 hours ago

Drone was absolutely perfect back when it was Free Software. Literally "run these commands in this docker container on these events" and basically nothing more. We ran the last fully open source version much longer than we probably should have.

When they went commercial, GitHub Actions became the obvious choice, but it's just married to so much weirdness and unpredictability.

Whole thing with Drone opened my eyes at least, I'll never sign a CLA again

jopsen - 13 hours ago

It's never just running commands in a container.
Don't get me wrong, it's a fantastic primitive.
But eventually you need to conditionally run some tests (to save compute).
For some benchmarks you might have limited hardware, so you need to coalesce jobs, and only run every 5 or 10 commits. You might want to keep the hardware hot, but also the queue small. So ideally you want to coalesce dynamically.
You also want result reporting, comparisons to previous results. Oh, and since you're coalescing some jobs and doing others conditionally you'll need ways to manually trigger skipped jobs later, maybe bisect too.
It's when you need to economize your compute that CI can get really complex. Especially, if you have fragile benchmark that or flaky tests.
Yes, in theory you can enforce a culture that removes flaky tests, but doing so often requires tooling support -- statistics, etc.
mrweasel - 10 hours ago

You and I have very different workflows I think. Drone was probably least intuitive system I've ever used. The idea seems nice, until you learn that Drone pretty much can't do anything useful out of the box. Want to move an artefact between steps, to bad, can't do that (at least you couldn't when we tried it out).
We ended up wrapping everything in a Docker container and back to just running a bash script. Drone had to be used because the architects that be, had decided that Drone was the answer to some question that no one apparently asked.
- donatj - 7 hours ago
  
  > Drone pretty much can't do anything useful out of the box
  That's the ideal. It's not doing anything you didn't explicitly tell it to.
  > We ended up wrapping everything in a Docker container and back to just running a bash script.
  That's literally what drone is for
homebrewer - 15 hours ago

It lives on as Woodpecker, the fork of the last truly free version. As simple as it gets, no CLAs required to contribute.
- donatj - 12 hours ago
  
  I saw that. We've sadly got a corporate mandate right now to move everyone to GitHub Actions

bob1029 - 17 hours ago

I've been able to effectively skip the entire CI/CD conversation by preferring modern .NET and SQLite.

I recently spent a day trying to get a GH Actions build going but got frustrated and just wrote my own console app to do it. Polling git, tracking a commit hash and running dotnet build is not rocket science. Putting this agent on the actual deployment target skips about 3 boss fights.

arunix - 10 hours ago

Is there something about .NET that makes this easier?
- bob1029 - 9 hours ago
  
  Self-contained deployments help a lot.

jph - 19 hours ago

You're 100% right IMHO about the convergence of powerful CI pipelines and full build systems. I'm very curious what you'll think if you try Dagger, which is my tool of choice for programming the convergence of CI and build systems. (Not affiliated, just a happy customer)

https://dagger.io/

karel-3d - 16 hours ago

I absolutely don't understand what it does from the website. (And there is way too much focus on "agents" on the front page for my tastes, but I guess it's 2025)
edit: all the docs are about "agents"; I don't want AI agents, is this for me at all?
ajb - 18 hours ago

So, it sounded interesting but they have bet too hard on the "developer marketing" playbook of "just give the minimum amount of explanation to get people to try the stuff".
For example, there is a quick start, so I skip that and click on "core concepts". That just redirects to quick start. There's no obvious reference or background theory.
If I was going to trust something like this I want to know the underlying theory and what guarantees it is trying to make. For example, what is included in a cache key, so that I know which changes will cause a new invocation and which ones will not.
- levlaz - 10 hours ago
  
  Hello, Dagger employee here.
  Thanks so much for taking a look and sharing your feedback! We've heard this feedback in the past and are working on a big docs change that should make this whole experience a lot better for folks that are new to dagger.
  https://devel.docs.dagger.io/getting-started/concepts
  This should land in the coming weeks.

0xbadcafebee - 14 hours ago

Having two different programs that are almost the same except for one or two differences, is actually better than trying to combine them.

Why do you even have a "build system"? Why not just a shell script that runs 'cc -o foo foo.c' ? Because there are more complicated things you want to do, and it would be annoying to write out a long shell script to do them all. So you have a program ('build system') that does the complicated things for you. That program then needs a config file so you can tell the program what to do.

But you want to run that 'build system' remotely when someone does a git-push. That requires a daemon on a hosted server, authentication/authorization, a git server that triggers the job when it receives a push, it needs to store secrets and pass them to the job, it needs to run it all in a container for reliability, it needs to run the job multiple times at once for parallelism, it needs to cache to speed up the jobs, it needs to store artifacts and let you browse the results or be notified of them. So you take all that complexity, put it in its own little system ('CI system'). And you make a config file so you can tell the 'CI system' how to do all that.

Could you shove both separate sets of complex features into one tool? Sure you can. But it would make it harder to develop and maintain them, change them, replace them. Much simpler to use individual smaller components to compose a larger system, than to try to build one big, complex, perfect, all-in-one-system.

Don't believe me? There's a reason most living creatures aren't 6-foot-tall amoebas. We're systems-on-systems-on-systems-on-systems (many of which have similar features) and it works pretty well. Our biggest problem is often that our individual parts aren't composeable/replaceable enough.

zokier - 15 hours ago

I agree on build systems and CI being closely related, and could (in an ideal world) benefit from far tighter integration. But..

> So here's a thought experiment: if I define a build system in Bazel and then define a server-side Git push hook so the remote server triggers Bazel to build, run tests, and post the results somewhere, is that a CI system? I think it is! A crude one. But I think that qualifies as a CI system.

Yes the composition of hooks, build, and result posting can be thought as a CI system. But then the author goes on to say

> Because build systems are more generic than CI systems (I think a sufficiently advanced build system can do a superset of the things that a sufficiently complex CI system can do)

Which is ignoring the thing that makes CI useful, the continuous part of continuous integration. Build systems are explicitly invoked to do something, CI systems continuosly observe events and trigger actions.

In the conclusion section author mentions this for their idealized system:

> Throw a polished web UI for platform interaction, result reporting, etc on top.

I believe that platform integrations, result management, etc should be pretty central for CI system, and not a side-note that is just thrown on top.

eisbaw - 16 hours ago

Local-first, CI-second.

CI being a framework, is easy to be locked into -- preventing local-first dev.

I find justfiles can help unify commands, making it easier to prevent accruement of logic in CI.

qwertytyyuu - 18 hours ago

Wait a CI isn't supposed to be a build system that also runs tests?

athrowaway3z - 17 hours ago

But you see - it's efficient if we add _our_ configuration layer with custom syntax to spawn a test-container-spawner with the right control port so that it can orchestrate the spawning of the environment and log the result to production-test-telemetry, and we NEED to have a dns-retry & dns-timeout parameter so our test-dns resolver has time to run its warm-up procedure.
And I want it all as a SaaS!
myrmidon - 16 hours ago

In my view, the CI system is supposed to run builds and tests in a standardized/reproducible environment (and to store logs/build artifacts).
This is useful because you get a single source of truth for "does that commit break the build" and eliminate implicit dependencies that might make builds work on one machine but not another.
But specifying dependencies between your build targets and/or sourcefiles, is turning that runner into a bad, incomplete reimplementation of make, which is what this post is complaining about AFAICT.
GuB-42 - 16 hours ago

A CI system is more like a scheduler.
To make things simple: make is a build system, running make in a cron task is CI.
There is nothing special about tests, it is just a step in the build process that you may or may not have.
SideburnsOfDoom - 11 hours ago

> a CI isn't supposed to be a build system?
No. "Continuous Integration" is the practice of frequently merging changes to main. In this sense, "integration" means to take my changes and combine them with other recent changes.
A build and test system like those described in this article is a way to make CI safe and fast. It's not CI itself, it's just the enabling automation: the pre-merge checks and the post-merge artefact creation.
s_ting765 - 15 hours ago

A CI is really just a "serverless" application.

nlawalker - 10 hours ago

>CI offerings like GitHub Actions and GitLab Pipelines are more products than platforms because they tightly couple an opinionated configuration mechanism (YAML files) and web UI (and corresponding APIs) on top of a theoretically generic remote execute as a service offering. For me to consider these offerings as platforms, they need to grow the ability to schedule arbitrary compute via an API, without being constrained by the YAML officially supported out of the box.

I wish the author gave more concrete examples about what kinds of workflows they want to dynamically construct and remotely execute (and why a separate step of registering the workflow up front with the service before running it is such a dealbreaker), and what a sufficiently generic and unopinionated definition schema for workflows and tasks would look like as opposed to what a service like GitHub Actions defines.

Generally, registering a workflow with the service (putting it in your repo, in the case of GHA) makes sense because you're running the same workflows over and over. In terms of task definitions, GHA is workflows -> jobs -> tasks -> actions, where jobs are tied to runners and can have dependencies defined between them. If you want to use those primitives to do something generic like run some scripts, you can do that in a very bare-bones way. When I look at the Taskcluster task definition they linked, I see pretty much the same thing.

jcranmer - 10 hours ago

> I wish the author gave more concrete examples about what kinds of workflows they want to dynamically construct and remotely execute (and why a separate step of registering the workflow up front with the service before running it is such a dealbreaker), and what a sufficiently generic and unopinionated definition schema for workflows and tasks would look like as opposed to what a service like GitHub Actions defines.
Something that comes up for me a lot at my work: running custom slices of the test suite. The full test suite probably takes CPU-days to run, and if I'm only interested in the results of something that takes 5 CPU-minutes to run, then I shouldn't have to run all the tests.
- nlawalker - 10 hours ago
  
  I suppose it's just a matter of perspective - I see that as a case for parameterization of a common test-run workflow, not for a one-off definition.
- ghthor - 7 hours ago
  
  We do this at work, it’s started off as a simple build graph that used git content hashes and some simple logic to link things together. The result being that for any given pair of commits you can calculate what changed so you can only run those tests/builds etc.
  We’ve paired this with buildkite which allows uploading pipeline steps at any point during the run, so our CI pipeline is one step, that generates the rest of the pipeline and uploads that.
  I’m working on open sourcing this meta-build tool as I think it is niche that has no current implementation and it is not our core business.
  It can build a dependency graph across many systems (terraform, go, python, nix) by parsing from those systems what they depend on. Smashes them all together, so you can have a terraform module that depends on a go binary that embeds some python; and if you change any of it then each parts can have tasks that are run (go test/build, tf plan, pytest, and etc)

bluGill - 10 hours ago

I disagree. CI and build systems have different responsibilities and so should be different systems. Both are extremely complex because they have to deal with the complex real world.

Many people have the idea they can make things simpler. Which is really easy because the basic problems are not that hard. Them someone needs "just one more small feature" which seems easy enough and it is - but the combination of everyone's small feature is complex.

Both systems end up having full programming languages because someone really needs that complexity for something weird - likely someone in your project. However don't abuse that power. 99% of what you need from both should be done in a declarative style that lets the system work and is simple. Just because you can do CI in the build system, or the build system's job with the CI system doesn't mean you should. Make sure you separate them.

You CI system should be a small set of entry points. "./do everything" should be your default. But maybe you need a "build", then "test part-a" and "test part-b" as separate. However those are all entry points that your CI system calls to your build system and they are things you can do locally. Can do locally doesn't mean you do - most of the time locally you should be an incremental build. Nothing should be allowed past CI without doing a full build from scratch just to make sure that works (this isn't saying your CI shouldn't do incremental builds for speed - just that it needs to do full rebuilds as well, and if full rebuild breaks you stop everyone until the full rebuild is fixed).

keybored - 9 hours ago

> Many people have the idea they can make things simpler. Which is really easy because the basic problems are not that hard. Them someone needs "just one more small feature" which seems easy enough and it is - but the combination of everyone's small feature is complex.
This is becoming the standard refrain for all software.
- bluGill - 9 hours ago
  
  One other difference: CI and build systems are treated as "side projects" and so none of these efforts have thought into the full system. Generally most software has architects, and often it is a 2nd system that after a lot of effort has finally been made to work.

solatic - 8 hours ago

OP's argument hinges too much on thinking that GitLab pipelines etc. only do CI.

The purpose of Continuous Integration is to produce the One Canonical Latest Build for a given system. Well... no surprise that there's a ton of overlap between these systems and Bazel etc. "build systems".

But GitLab pipelines etc. are also Continuous Deployment systems. You don't always need fancy ArgoCD pull-based deployments or, their precursor, Chef/Puppet were also pull-based deployments for VMs. You can just have GitLab run a deployment script that calls kubectl apply, or Capistrano, or scp and ssh systemctl restart, or whatever deploys the software for you. That's not something that makes sense as part of your build system.

germandiago - 11 hours ago

I am running buildbot with a customized matrix style buildbot for years for my side projects.

This is because yes, it is very complex. I have tried Jenkins before and Gitlab CI.

Something that most build tools and CIs should learn from Meson build system is that sometimes it is better to just keep it simple than adding features on top. If you need them, script them in some way but keep configuration as data-driven (and I mean purely data-driven, not half a language).

My build system is literally: a build matrix, where you can specify filters of what to keep or skip. This gets all combined.

A series of steps with a name that can be executed or not depending on a filter. Nothing else. Every step calls the build system or whatever.

After that it sends mail reports and integrates with Gerrit to send builds and Gerrit csn also csll it.

No fsncy plugins or the like. Just this small toml file I have and run normal scripts or command lines without 300 layers on top. There are already enough things that can break so that one keeps adding opaque layers on top. Just use the tools we all know: ssh, bash, Python etc.

Everyone knows how to call that. If a step is too complex, just make a script.

teknopaul - 15 hours ago

I wrote Linci to tackle this issue a few years back

Https://linci.tp23.org

Ci is too complicated and are basically about locking. But what you (should) do is run cli commands on dedicated boxes in remote locations.

In Linci every thing done remote is the same locally. Just pick a box for the job.

There is almost no code, and what there is could be rewritten is any language if you prefer. Storage is git/VCs + filesystem.

Filesystem are kit fashionable because they are a problem for the big boys but not for you or I. File system storage makes thing easy and hackable.

That is unix bread and butter. Microsoft need a ci in yaml. Linux does not.

Been using it for a while an a small scale and it's never made me want anything else.

Scripting bash Remoting ssh Auth pam Notification irc/II (Or mail stomp etc) Scheduling crond Webhooks not needed if repo is on the same container use bash for most hooks, and nodejs server that calls cli for github

Each and every plug-in is a bash script and some env variables.

Read other similar setups hacked up with make. But I don't like the env vars handling and syntax of make. Bash is great if what you do is simple, and as the original article points out so clearly, if your ci is complicated you should probably rethink it.

teknopaul - 15 hours ago

Oh and debugging builds is a charm: Ssh in to the remote box, and run the same commands the tool is running, as the same user in a bash shell(the same language) .
CI debugging at my day job is literally impossible. Read logs, try the whole flow again from the beginning.
With Linci, I can fix any stage in the flow, if I want to, or check-in and run again if I an 99% sure it will work.

lukaslalinsky - 17 hours ago

Any universal build system is complex. You can either make the system simple and delegate the complexity to the user, like the early tools, e.g. buildbot. Or you can hide the complexity to the best of your ability, like GitHub actions. Or you expose all the complexity, like jenkins. I'm personally happy for the complexity being hidden and can deal with a few leaky abstractions if I need something non standard.

jcelerier - 10 hours ago

> GitLab Pipelines is a lot better. GitLab Pipelines supports features like parent-child pipelines (dependencies between different pipelines), multi-project pipelines (dependencies between different projects/repos), and dynamic child pipelines (generate YAML files in pipeline job that defines a new pipeline). (I don't believe GitHub Actions supports any of these features.)

I believe github actions does all of this? I use the first two features

Angostura - 17 hours ago

'Continuous Integration' in case anyone is wondering. Not spelled out anywhere in the article.

augustk - 10 hours ago

Just tech bros being tech bros.
https://blog.mitchjlee.com/2020/your-writing-style-is-costly

IshKebab - 18 hours ago

Yeah I think this is totally true. The trouble is there are loads of build systems and loads of platforms that want to provide CI with different features and capabilities. It's difficult to connect them.

One workaround that I have briefly played with but haven't tried in anger: Gitlab lets you dynamically create its `.gitlab-ci.yaml` file: https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#d...

So you can have your build system construct its DAG and then convert that into a `.gitlab-ci.yaml` to run the actual commands (which may be on different platforms, machines, etc.). Haven't tried it though.

Thom2000 - 17 hours ago

I've used dynamic pipelines. They work quite well, with two caveats: now your build process is two step and slower. And there are implementation bugs on Gitlab's side: https://gitlab.com/groups/gitlab-org/-/epics/8205
FWIW Github also allows creating CI definitions dynamically.
dakiol - 17 hours ago

If there’s something worse than a gitlab-ci.yaml file that is a dynamically-generated gitlab-ci.yaml file.
- IshKebab - 9 hours ago
  
  Why? By dynamically generating it you can skip a lot of the nonsense (any kind of conditional rules, optional jobs, etc.).
  - dakiol - 9 hours ago
    
    They are suddenly 10x more difficult to change to suit your needs.

Flaaaaanders - 14 hours ago

The article resonates a lot with me. I've been seeing the transition from Jenkins to Azure DevOps / GitHub Actions (same thing more or less) in the company I'm working at and came to very similar conclusions. The single big Jenkins machine shared by 10+ teams mixing UI configuration from 20 plugins with build systems and custom scripts wasn't great, so it was the right decision to move away from it. However, neither great is the current workflow of write->commit->wait->fail->write... while figuring out the correct YAML syntax of some third party GitHub Action that is required to do something very basic like finding files in a nested folder by pattern.

Take a look at Prefect - https://www.prefect.io/ - as far as I can see, it ticks a lot of the boxes that the author mentions (if you can live with the fact that the API is a Python SDK; albeit a very good one that gives you all the scripting power of Python). Don't be scared away by the buzzwords on the landing page, browsing the extensive documentation is totally worthwhile to learn about all of the features Prefect offers. Execution can either happen on their paid cloud offering or self-hosted on your own physical or cloud premises at no extra cost. The Python SDK is open source.

Disclaimer: I am not affiliated with Prefect in any way.

aa-jv - 18 hours ago

I have built many CI/build-servers over the decades for various projects, and after using pretty much everything else out there, I've simply reverted, time and again - and, very productively - to using Plain Old Bash Scripts.

(Of course, this is only possible because I can build software in a bash shell. Basically: if you're using bash already, you don't need a foreign CI service - you just need to replace yourself with a bash script.)

I've got one for updating repo's and dealing with issues, I've got one for setting up resources and assets required prior to builds, I've got one for doing the build - then another one for packaging, another for signing and notarization, and finally one more for delivering the signed, packaged, built software to the right places for testing purposes, as well as running automated tests, reporting issues, logging the results, and informing the right folks through the PM system.

And this all integrates with our project management software (some projects use Jira, some use Redmine), since CLI interfaces to the PM systems are easily attainable and set up. If a dev wants to ignore one stage in the build pipeline, they can - all of this can be wrapped up very nicely into a Makefile/CMakeLists.txt rig, or even just a 'build-dev.sh vs. build-prod.sh' mentality.

And the build server will always run the build/integration workflow according to the modules, and we can always be sure we'll have the latest and greatest builds available to us whenever a dev goes on vacation or whatever.

And all this with cross-platform, multiple-architecture targets - the same bash scripts, incidentally, run on Linux, MacOS and Windows, and all produce the same artefacts for the relevant platform: MacOS=.pkg, Windows=.exe, Linux=.deb(.tar)

Its a truly wonderful thing to onboard a developer, and they don't need a Jenkins login or to set up Github accounts to monitor actions, and so on. They just use the same build scripts, which are a key part of the repo already, and then they can just push to the repo when they're ready and let the build servers spit out the product on a network share for distribution within the group.

This works with both Debug and Release configs, and each dev can have their own configuration (by modifying the bash scripts, or rather the env.sh module..) and build target settings - even if they use an IDE for their front-end to development. (Edit: /bin/hostname is your friend, devs. Use it to identify yourself properly!)

Of course, this all lives on well-maintained and secure hardware - not the cloud, although theoretically it could be moved to the cloud, there's just no need for it.

I'm convinced that the CI industry is mostly snake-oil being sold to technically incompetent managers. Of course, I feel that way about a lot of software services these days - but really, to do CI properly you have to have some tooling and methodology that just doesn't seem to be being taught any more, these days. Proper tooling seems to have been replaced with the ideal of 'just pay someone else to solve the problem and leave management alone'.

But, with adequate methods, you can probably build your own CI system and be very productive with it, without much fuss - and I say this with a view on a wide vista of different stacks in mind. The key thing is to force yourself to have a 'developer workstation + build server' mentality from the very beginning - and NEVER let yourself ship software from your dev machine.

(EDIT: call me a grey-beard, but get off my lawn: if you're shipping your code off to someone else [github actions, grrr...] to build artefacts for your end users, you probably haven't read Ken Thompsons' "Reflections On Trusting Trust" deeply or seriously enough. Pin it to your forehead until you do!)

esafak - 11 hours ago

The code-based CI platform dagger.io used to support CUE lang but dropped it due to lack of interest. Combining that with something like bazel, all in CUE or Skylark sounds interesting, but bazel and dagger are both pretty complex on their own. Their merger would be too much.

benterix - 17 hours ago

The author has a point about CI being a build system and I saw it used and abused in various ways (like the CI containing only one big Makefile with the justification that we can easily migrate from one CI system to another).

However, with time, you can have a very good feel of these CI systems, their strong and weak points, and basically learn how to use them in the simplest way possible in a given situation. Many problems I saw IRL are just a result of an overly complex design.

donperignon - 18 hours ago

2025 and Jenkins still the way to go

maccard - 15 hours ago

Disagree - using the one built into your hosting platform is the way to go, and I’d that doesn’t work for whatever reason, teamcity is better in every way
ohdeargodno - 17 hours ago

The fact that maintaining any Jenkins instance makes you want to shoot yourself and yet it's the least worst option is an indictment of the whole CI universe.
I have never seen a system with documentation as awful as Jenkins, with plugins as broken as Jenkins, with behaviors as broken as Jenkins. Groovy is a cancer, and the pipelines are half assed, unfinished and incompatible with most things.
- bigstrat2003 - 9 hours ago
  
  I have zero problems maintaining Jenkins, and have done so at a couple of different jobs in the past. Minimize how many plugins you use and it works great. We use just a handful: configuration as code, credential storage, kubernetes agent support, pipelines, and job DSL (plus their dependencies of course). Everything is easy to manage because it's just config files in a repo, and things just work for us (with only very rare exceptions).
  It would probably be more constructive if you elaborated what your issues specifically were. For example, what have you found pipelines to be incompatible with? I've literally never seen anything they don't work with, so I can't really agree with your assessment without specifics. Similarly, I have zero problem with Groovy. If it's just not to your taste then fine, taste is subjective, but I can't see any substantive reason to call it "a cancer".
- zokier - 15 hours ago
  
  This is pretty much my experience too. Working with jenkins is always complete pain, but at the same time I can't identify any really solid alternatives either. So far sourcehut builds is looking the most promising, but I haven't had chance to use it seriously. While it's nominally part of the rest of sourcehut ecosystem, I believe it could be run with minor tweaks also standalone if needed
- maratc - 16 hours ago
  
  "Jenkins is the worst form of CI except for all those other forms that have been tried" - Winston Churchill, probably
- mike_hearn - 11 hours ago
  
  Least worst compared to what? You think TeamCity is worse?
- - 15 hours ago
  
  [deleted]

GnarfGnarf - 17 hours ago

CI = Continuous Integration

e1gen-v - 15 hours ago

I’ve been using Pulumi automation in our CI and it’s been really nice. There’s definitely a learning curve with the asynchronous Outputs but it’s really nice for building docker containers and separating pieces of my infra that may have different deployment needs.

forrestthewoods - 18 hours ago

> But if your configuration files devolve into DSL, just use a real programming language already.

This times a million.

Use a real programming language with a debugger. YAML is awful and Starlark isn’t much better.

CGamesPlay - 18 hours ago

> Use a real programming language with a debugger. YAML is awful and Starlark isn’t much better.
I was with you until you said "Starlark". Starlark is a million times better than YAML in my experience; why do you think it isn't?
thrown-0825 - 18 hours ago

bonus points when you start embedding code in your yamlified dsl.

jFriedensreich - 15 hours ago

If complex ci becomes indistinguishable from build systems, simple ci becomes indistinguishable from workflow engines. in an ideal world you would not need an ci product at all. the problem is there is neither a great build system nor workflow engine.

j4coh - 18 hours ago

Since the article came out in 2021 did anyone ever build the product of his dreams described in the conclusion?

iberator - 15 hours ago

That's why God created Jenkins. My favourite application ever

joaonmatos - 14 hours ago

Sometimes I feel we Amazonians are in a parallel world when it comes to building and deploying.

palmfacehn - 14 hours ago

How much of this is a result of poorly thought out build systems, which require layer after layer of duct tape? How much is related to chasing "cloud everything" narratives and vendor specific pipelines? Even with the sanest tooling, some individuals will manage to create unhygenic slop. How much of the remainder is a futile effort to defend against these bad actors?

m-s-y - 15 hours ago

Not a single definition of CI in the posting at all.

A tale as old as time I suppose…

positron26 - 18 hours ago

Fiefdoms. Old as programming. Always be on the lookout for people who want to be essential rather than useful.

jillesvangurp - 11 hours ago

Keeping it simple is always a good idea. I've been pretty happy with gh actions lately. I've seen everything from hudson/jenkins, travis ci, git lab, etc. Most of that stuff is fine if you keep it simple. Building your software should be simple if you do it manually. If it is, it's easy to automate with CI.

The same goes for other tools: build tools (ant, maven, gradle, npm, etc.); Configuration systems (puppet, ansible, salt, etc.); Infrastructure provisioning (cloudformation, terraform, etc.); other containerization and packaging tools (packer, docker, etc.).

Stick to what they are good at. Don't overload them with crap outside the scope of what they do (boiling oceans, lots of conditional logic, etc.). And consider whether you need them at all. Write scripts for all the rest. My default is a simple bash script. Replacing a 2 line script with 100+ lines of yaml is a clear sign that something is wrong with what you are doing.

A consideration lately is not just automated builds but having agentic coding tools be able to work with your software. I just spent an afternoon nudging codex along to vibe code me a new little library. Mostly it's nailing it and I'm iterating with it on features, tests, documentation etc. It of course needs to be able to run tests to validate what it's doing. And it needs to be able to figure out how. The more complicated that is, the less likely it is to be useful.

CI and agentic coding have similar needs: simplicity and uniformity. If you have that, everything gets easier.

Anything custom and wonky needs to isolated and removed from the critical path. Or removed completely. Devops work is drudgery that needs to be minimized and automated. If it becomes most of what you do, you're doing it wrong. If an agentic coding system can figure out how to build and run your stuff, getting it to setup CI and deployment scripts is not that much of a leap in complexity.

After a few decades with this stuff, I have a low threshold for devops bullshit. I've seen that go sideways and escalate into months long projects to do god knows what a few times too often. Life is too short to deal with that endlessly. The point of automating stuff is so you can move on and do more valuable things. If automating it takes up all your time, something is very wrong.

akoboldfrying - 15 hours ago

You can roll your own barebones DAG engine in any language that has promises/futures and the ability to wait for multiple promises to resolve (like JS's Promise.all()):

    For each task t in topological order: 
      Promise.all(all in-edges to t).then(t)

Want to run tasks on remote machines? Simply waves hands make a task that runs ssh.

mike_hearn - 17 hours ago

I've investigated this idea in the past. It's an obvious one but still good to have an article about it, and I'd not heard of Taskcluster so that's cool.

My conclusion was that this is near 100% a design taste and business model problem. That is, to make progress here will require a Steve Jobs of build systems. There's no technical breakthroughs required but a lot of stuff has to gel together in a way that really makes people fall in love with it. Nothing else can break through the inertia of existing practice.

Here are some of the technical problems. They're all solvable.

• Unifying local/remote execution is hard. Local execution is super fast. The bandwidth, latency and CPU speed issues are real. Users have a machine on their desk that compared to a cloud offers vastly higher bandwidth, lower latency to storage, lower latency to input devices and if they're Mac users, the fastest single-threaded performance on the market by far. It's dedicated hardware with no other users and offers totally consistent execution times. RCE can easily slow down a build instead of speeding it up and simulation is tough due to constantly varying conditions.

• As Gregory observes, you can't just do RCE as a service. CI is expected to run tasks devs aren't trusted to do, which means there has to be a way to prove that a set of tasks executed in a certain way even if the local tool driving the remote execution is untrusted, along with a way to prove that to others. As Gregory explores the problem he ends up concluding there's no way to get rid of CI and the best you can do is reduce the overlap a bit, which is hardly a compelling enough value prop. I think you can get rid of conventional CI entirely with a cleverly designed build system, but it's not easy.

• In some big ecosystems like JS/Python there aren't really build systems, just a pile of ad-hoc scripts that run linters, unit tests and Docker builds. Such devs are often happy with existing CI because the task DAG just isn't complex enough to be worth automating to begin with.

• In others like Java the ecosystem depends heavily on a constellation of build system plugins, which yields huge levels of lock-in.

• A build system task can traditionally do anything. Making tasks safe to execute remotely is therefore quite hard. Tasks may depend on platform specific tooling that doesn't exist on Linux, or that only exists on Linux. Installed programs don't helpfully offer their dependency graphs up to you, and containerizing everything is slow/resource intensive (also doesn't help for non-Linux stuff). Bazel has a sandbox that makes it easier to iterate on mapping out dependency graphs, but Bazel comes from Blaze which was designed for a Linux-only world inside Google, not the real world where many devs run on Windows or macOS, and kernel sandboxing is a mess everywhere. Plus a sandbox doesn't solve the problem, only offers better errors as you try to solve it. LLMs might do a good job here.

But the business model problems are much harder to solve. Developers don't buy tools only SaaS, but they also want to be able to do development fully locally. Because throwing a CI system up on top of a cloud is so easy it's a competitive space and the possible margins involved just don't seem that big. Plus, there is no way to market to devs that has a reasonable cost. They block ads, don't take sales calls, and some just hate the idea of running proprietary software locally on principle (none hate it in the cloud), so the only thing that works is making clients open source, then trying to saturate the open source space with free credits in the hope of gaining attention for a SaaS. But giving compute away for free comes at staggering cost that can eat all your margins. The whole dev tools market has this problem far worse than other markets do, so why would you write software for devs at all? If you want to sell software to artists or accountants it's much easier.

eptcyka - 14 hours ago

Ideally, CI would just invoke the build system. With nix, this os trivial.

csomar - 15 hours ago

I am working on this problem and while I agree with the author, there is room for improvement for the current status quo:

> So going beyond the section title: CI systems aren't too complex: they shouldn't need to exist. Your CI functionality should be an extension of the build system.

True. In the sense that if you are running a test/build, you probably want to start local first (dockerize) and then run that container remotely. However, the need for CI stems from the fact that you need certain variables (ie: you might want to run this, when commit that or pull request this or that, etc.) In a sense, a CI system goes beyond the state of your code to the state of your repo and stuff connected to your repo (ie: slack)

> There is a GitHub Actions API that allows you to interact with the service. But the critical feature it doesn't let me do is define ad-hoc units of work: the actual remote execute as a service. Rather, the only way to define units of work is via workflow YAML files checked into your repository. That's so constraining!

I agree. Which is why most people will try to use the container or build system to do these complex tasks.

> Taskcluster's model and capabilities are vastly beyond anything in GitHub Actions or GitLab Pipelines today. There's a lot of great ideas worth copying.

You still need to run these tasks as containers. So, say if you want to compare two variables, that's a lot of compute for a relatively simple task. Which is why the status quo has settled with GitHub Actions.

> it should offer something like YAML configuration files like CI systems do today. That's fine: many (most?) users will stick to using the simplified YAML interface.

It should offer a basic programming/interpreted language like JavaScript.

This is an area where WebAssembly can be useful. At its core, WASM is a unit of execution. It is small, universal, cheap and has a very fast startup time compared to a full OS container. You can also run arbitrarily complex code in WASM while ensuring isolation.

My idea here is that CI becomes a collection of executable tasks that the CI architect can orchestrate while the build/test systems remain a simple build/test command that run on a traditional container.

> Take Mozilla's Taskcluster and its best-in-class specialized remote execute as a service platform.

That would be a mistake, in my opinion. There is a reason Taskcluster has failed to get any traction. Most people are not interested in engineering their CI but in getting tasks executed on certain conditions. Most companies don't have people/teams dedicated for this and it is something developers do alongside their build/test process.

> Will this dream become a reality any time soon? Probably not. But I can dream. And maybe I'll have convinced a reader to pursue it.

I am :) I do agree with your previous statement that it is a hard market to crack.

dochtman - 18 hours ago

(2021)

SideburnsOfDoom - 16 hours ago

The issue that I see is that "Continuous integration" is the practice of frequently merging to main.

Continuous: do it often, daily or more often

Integration: merging changes to main

He's talking about build tools, which are a _support system_ for actual CI, but are not a substitute for it. These systems allow you to Continuously integrate, quickly and safely. But they aren't the thing itself. Using them without frequent merges to main is common, but isn't CI. It's branch maintenance.

Yes, semantic drift is a thing, but you won't get the actual benefits of the actual practice if you do something else.

If you want to talk "misdirected CI", start there.

oldpersonintx2 - 18 hours ago

[dead]