Prek: A better, faster, drop-in pre-commit replacement, engineered in Rust
github.com165 points by fortuitous-frog 7 hours ago
165 points by fortuitous-frog 7 hours ago
BTW. Pre-commit hooks are the wrong way to go about this stuff.
I'm advocating for JJ to build a proper daemon that runs "checks" per change in the background. So you don't run pre-commit checks when committing. They just happen in the background, and when by the time you get to sharing your changes, you get all the things verified for you for each change/commit, effortlessly without you wasting time or needing to do anything special.
I have something a bit like that implemented in SelfCI (a minimalistic local-first Unix-philosophy-abiding CI) https://app.radicle.xyz/nodes/radicle.dpc.pw/rad%3Az2tDzYbAX... and it replaced my use of pre-commit hooks entirely. And users already told me that it does feel like commit hooks done right.
Just because the hooks have the label "pre-commit" doesn't mean you have to run them before committing :).
I, too, want checks per change in jj -- but (in part because I need to work with people who are still using git) I need to still be able to use the same checks even if I'm not running them at the same point in the commit cycle.
So I have an alias, `jj pre-commit`, that I run when I want to validate my commits. And another, `jj pre-commit-branch`, that runs on a well-defined set of commits relative to @. They do use `pre-commit` internally, so I'm staying compatible with git users' use of the `pre-commit` tool.
What I can't yet do is run the checks in the background or store the check status in jj's data store. I do store the tree-ish of passing checks though, so it's really quick to re-run.
Looks very interesting, I fully agree that running CI locally is viable.
But what I didn't pick up for a quick scan of README is best pattern for integrating with git. Do you expect users to manually run (a script calling) selfci manually or is it hooked up to git or similar? When does the merge hooks come into play? Do you ask selfci to merge?
Yep, I think a watcher is better suited [0] to trigger on file changes.
I personally can't stand my git commit command to be slow or to fail.
[0]: such as https://github.com/watchexec/watchexec
To myself: sometimes I think the background process should be committing for me automatically each time a new working set exists, and I should only rebase and squash before pushing.
That’s reversing the flow of control, but might be workable!
jj already pretty much does that with the oplog. A consistent way of making new snapshots in the background would be nice though. (Currently you have to run a jj command — any jj command — to capture the working directory.)
You can configure watchman to do it. `fsmonitor.watchman.register-snapshot-trigger = true`
I don't recommend it, though, at least not on large repositories. Too much opportunity to collide with command-line jj write operations.
I like this approach. Something related I've been tinkering with are "protected bookmarks" - you declare what bookmarks (main, etc) are protected in your config.toml and the normal `jj bookmark` commands that change the bookmark pointer will fail, unless you pass a flag. So in your local "CI" script you can do `jj bookmark set main -r@ --allow-protected` iff the tests/lints pass. Pairs well with workspaces and something that runs a local CI (like a watcher/other automated process).
I haven't yet submitted it to upstream for design discussion, but I pushed up my branch[1]. You can also declare a revset that the target revision must match, for extra belts and suspenders (eg., '~conflicts()')
[1] https://github.com/paulsmith/jj/tree/protected-bookmarks
Cool! That would pair well with SelfCI's MQ daemon, preventing accidentally forgetting about merging in stuff without running the local CI.
That's a great idea, and I was just thinking about how it would pair with self hosted CI of some type.
Basically what I would want is write a commit (because I want to commit early and often) then run the lint (and tests) in a sandboxed environment. if they pass, great. if they fail and HERAD has moved ahead of the failing commit, create a "FIXME" branch off the failure. back on main or whatever branch head was pointed at, if tests start passing, you probably never need to revisit the failure.
I want to know about local test failures before I push to remote with full CI.
automatic branching and workflow stuff is optional. the core idea is great.
> automatic branching and workflow stuff is optional. the core idea is great.
I'm not sure if I fully understood. But SelfCI's Merge-Queue (mq) daemon has a built-in hook system, so it's possible to do custom stuff at certain points. So probably you should be able to implement it already, or it might require couple of minor tweaks (should be easy to do on SelfCI side after some discussion).
Being visible is useful, this is probably better suited for an ide than a hook or a daemon.
That looks really cool! I've been looking for a more thought-out approach to hooks on JJ, I'll dig into this. Do you have any other higher level architecture/overview documentation other than what is in that repo? It has a sense of "you should already know what this does" from the documentation as is.
Also, how do you like Radicle?
> Do you have any other higher level architecture/overview documentation other than what is in that repo?
SelfCI is _very_ minimal by design. There isn't really all that much to document other than what is described in the README.
> Also, how do you like Radicle?
I enjoy that it's p2p, and it works for me in this respect. Personally I disagree with it attempt to duplicate other features of GitHub-like forge, instead of the original collaborate model of Linux kernel that git was built for. I think it should try to replicate something more like SourceHut, mailinglist thread, communication that includes patches, etc. But I did not really _collaborated_ much using Radicle yet, I just push and pull stuff from it and it works for that just fine.
I have also been working on an alternative written in Rust, but in my version the hooks are WASI programs. They run on a virtual filesystem backed by the Git repo. That means a) there are no security issues (they have no network access, and no file access outside the repo), b) you can run them in parallel, c) you can choose whether to apply fixes or not without needing explicit support from the plugin, and most importantly d) they work reliably.
I'm sure this is more reliably than pre-commit, but you still have hooks building Python wheels and whatnot, which fails annoyingly often.
The VFS stuff is not quite finished yet though (it's really complicated). If anyone wants to help me with that it would be welcome!
the second the hooks modify the code they've broken your sandbox
I think wasi is a cool way to handle this problem. I don't think security is a reason though.
> the second the hooks modify the code they've broken your sandbox
Changes to code would obviously need to be reviewed before they are committed. That's still much better than with pre-commit, where e.g. to do simple things like banning tabs you pretty much give some guy you don't know full access to your machine. Even worse - almost everyone that uses pre-commit also uses tags instead of commit hashes so the hook can be modified retroactively.
One interesting attack would be for a hook to modify e.g. `.vscode/settings.json`... I should probably make the default config exclude those files. Is that what you meant? Even without that it's a lot more secure than pre-commit.
I wouldn't want hooks modifying the code. They should be only approve/reject. Ideally landlock rules would give them only ro access to repo dir
I think it was a massive mistake to build on the pre-commit plugin base. pre-commit is probably the most popular tool for pre-commit hooks but the platform is bad. My main critique is that it mixes tool installation with linting—when you will undoubtedly want to use linters _outside_ of hooks. The interface isn't built with parallelism in mind, it's sort of bolted on but not really something I think could work well in practice. It also uses a bunch of rando open source repos which is a supply chain nightmare even with pinning.
pre-commit considered harmful if you ask me. prek seems to largely be an improvement but I think it's improving on an already awful platform so you should not use it.
I know I am working on a competing tool, but I don't share the same criticism for lefthook or husky. I think those are fine and in some ways (like simplicity) better than hk.
I think really they just need to implement some kind of plug-in or extension framework. Extensions are just not first class citizens but they really should be.
There should be a .gitextensions in the repo that the repo owners maintain just like .gitignores and . gitattributes etc etc. Everything can still be opt in by every user but at least all git clients would be able to know about, pull down, and install per user discretion.
It seems pretty basic in this day and age but it's still a gaping hole. You still need to manually call LFS install for goodness sake.
I use http://hk.jdx.dev/, which is based on https://pkl-lang.org/ and Rust, as it integrates with http://mise.jdx.dev/.
Is prek much better?
Love mise, didn't know about hk. Will check this out but don't think $WORK (or me) needs more than lefthook at the moment, which we're quite happy with. Wonder if there are comparisons/example projects that showcases the unique value propositions.
Correct me if I'm wrong but lefthook doesn't run its hooks exclusively on the staged changes IIRC. pre-commit, and prek by extension, have a process to stash the unstaged changes using git and running the code only on the staged files. Last I used it, lefthook ran on every file regardless of git status. This annoyed me because I'd have a few stray files that were not ready to be checked in or tracked that would trigger failures in lefthook. At the time this also made some hooks run slower since it would run on every single file but I think most linters have become significantly faster now.
Please look at the example that is literally on the front page of the lefthook website: https://lefthook.dev/
Ah ok the home page actually reminded me what the actual issue was. It can pass the list of staged files to the command but since it doesn't actually stash anything, it's not compatible with commands that don't accept a list of files. golangci-lint for example doesn't accept a list of files like this and will run on every single file in the repo. I don't know if this behaviour has changed in lefthook or golangci-lint now.
in hk you can not only have a mix of staged/unstaged files but it even deals with staged/unstaged HUNKS in the same file (best it can at least)
prek is compatible with pre-commit so any hooks that can be used for pre-commit can be used with prek including the repo config file. Depending on if you're interested in buying into the existing pre-commit ecosystem, which is pretty extensive, then prek is a really good alternative
My big problem with pre-commit is that it doesn't have any way for you to have your own commit hoos that run in addition to the hooks that are part of the repo, and the author of it is hostile to any suggestion of supporting that. Heaven forbid that I want to run something on commit that other developers who work on the repo don't want to.
The author of pre-commit is known to be pretty hostile :p You should make an issue for prek though!
I am a big fan of prek and have converted a couple of projects over from pre-commit
The main advantage for me is that prek has support for monorepo/workspaces, while staying compatible with existing pre-commit hooks.
So you can have additional .pre-commit-config.yaml files in each workspace under the root, and prek will find and run them all when you commit. The results are collated nicely. Just works.
Having the default hooks reimplemented in Rust is minor bonus (3rd party hooks won't be any faster) and also using uv as the package manager speeds up hook updates for python hooks.
Really enjoying using prek.
Dedicated a whole chapter to it in my latest book, Effective Testing.
The trend of fast core (with rust) and convenient wrapper is great while we are still writing code.
I struggle to see value with git hooks. They're an opt-in, easily opt-out way of calling shell scripts from my understanding--you can't force folks to run them, and they don't integrate/display nicely with CI/CD.
Why not just call a shell script directly? How would you use these with a CI/CD platform?
I tend to work the other way around - what is defined in CI steps gets added to pre-commit. Several tools have already existing configurations or you can use local mode. Sure, I can't force people to use it but it saves them time as CI would fail anyway.
This might be a me problem but I extensively manipulate the git history all the time which makes me loathe git hooks. A commit should take milliseconds, not a minute.
it’s not just you.
i regularly edit history of PRs for a variety of reasons and avoid pre-commit when possible.
put it all in CI thank you please — gimme a big red X on my pipeline publicly telling me i’ve forgotten to do something considered important.
You do seem to be doing it wrong. Extensive manipulation of the record and slow hooks are both undesirable.
I would reckon cleaning up your branch before opening a pull request is good practice. I also rebase a lot, aswell as git reset, and I use wip commits.
Slow hooks are also not a problem in projects I manage as I don't use them.
No, I would not and don't do that. It is better to leave the PR commits separate and atomic so reviewers can digest them more easily. You just squash on merge.
> Slow hooks are also not a problem in projects I manage as I don't use them.
You bypass the slow hooks you mentioned? Why even have hooks then?
> It is better to leave the PR commits separate and atomic so reviewers can digest them more easily.
So reviewers have to digest all of the twists and turns I took to get to the final result? Why oh why oh why?
Sure, if they've already seen some of it, then there should be an easy way for them to see the updates. (Either via separate commits or if you're fortunate enough to have a good review system, integrated interdiffs so you can choose what view to use.)
In a better world, it would be the code author's responsibility to construct a meaningful series of commits. Unless you do everything perfectly right the first time, that means updating commits or using fixup commits. This doesn't just benefit reviewers, it's also enormously valuable when something goes wrong and you can bisect it down to one small change rather than half a dozen not-even-compiling ones.
But then, you said "atomic", which suggests you're already trying to make clean commits. How do you do that without modifying past commits once you discover another piece that belongs with an earlier step?
> You just squash on merge.
I'd rather not. Or more specifically, optimal review granularity != optimal final granularity. Some things should be reviewed separately then squashed together (eg a refactoring + the change on top). Some things should stay separate (eg making a change to one scary area and then making it to another). And optimal authoring granularity can often be yet another thing.
But I'll admit, git + github tooling kind of forces a subpar workflow.
> No, I would not and don't do that. It is better to leave the PR commits separate and atomic so reviewers can digest them more easily. You just squash on merge.
Someone who regularly rewrites locally with thought also will not "just squash" on merge.
You seem to be doing it wrong.
I do leave PR commits separate. In my teams I don't set up pre-commit hooks altogether, unless others feel strongly otherwise. In projects where they are forced upon me I frequently --no-verify hooks if they are slow, as the linter runs on save and I run tests during development. CI failing unintentionally is usually not a problem for me.
You can obviously bypass them, but having precommit hooks to run scripts locally, to make sure certain checks pass, can save them from failing in your pipeline, which can save time and money.
From an org standpoint you can have them (mandate?) as part of the developer experience.
(Our team doesn't use them, but I can see the potential value)
I never understood this argument.
The checks in those pre-commit hooks would need to be very fast - otherwise they'd be too slow to run on every commit.
Then why would it save time and money if they only get run at the pipeline stage? That would only save substantial time if the pipepline is architected in a suboptimal way: Those checks should get run immediately on push, and first in the pipeline so the make the pipeline fail fast if they don't pass. Instant Slack notification on fail.
But the fastest feedback is obviously in the editor, where such checks like linting / auto-formatting belong, IMHO. There I can see what gets changed, and react to it.
Pre-commit hooks sit in such a weird place between where I author my code (editor) and the last line of defense (CI).
> Then why would it save time and money if they only get run at the pipeline stage? That would only save substantial time if the pipepline is architected in a suboptimal way: Those checks should get run immediately on push, and first in the pipeline so the make the pipeline fail fast if they don't pass. Instant Slack notification on fail.
That's still multiple minutes compared to an error thrown on push - i.e. long enough for the dev in question to create a PR, start another task, and then leave the PR open with CI failures for days afterwards.
> But the fastest feedback is obviously in the editor, where such checks like linting / auto-formatting belong, IMHO.
There are substantial chunk of fast checks that can't be configured in <arbitrary editor> or that require a disproportionate time investment. (e.g. you could write and maintain a Visual Studio extension vs just adding a line to grep for pre-commit)
I think there's value in git hooks, but pre-commit is the wrong hook. This belongs in a hook that runs on attempted push, not on commit.
"pre-commit the tool" supports the pre-push hook (as well as the various other hooks).
There's a config option for that :) https://prek.j178.dev/configuration/#default_install_hook_ty...
formatting should definitely be in pre-commit though, otherwise you'll destroy diffs.
They're very commonly used in CI. There are dedicated GitHub actions for pre-commit and prek, but most commonly people just invoke something like `prek run --all-files` or `pre-commit run --all-files` in their typical lint CI jobs.
The prek documentation has a list of many large projects (such as CPython and FastAPI, to name a few) who use it; each link is a PR of how they integrated it into CI if you want to see more: https://prek.j178.dev/#who-is-using-prek
The value is in finding out something is going to fail locally before pushing it. Useful for agents and humans alike.
They integrate well with CI.
You run the same hooks in CI as locally so it's DRY and pushes people to use the hooks locally to get the early feedback instead of failing in CI.
Hooks without CI are less useful since they will be constantly broken.
Why wouldn't I just call the same shell script in CI and locally though? What's the benefit here? All I'm seeing is circular logic.
The point is enforcement. If there's a newcomer to developing your repo, you can ask them to install the hooks and from thereon everything they commit will be compatible with the processes in your CI. You don't need to manually run the scripts they'll run automatically as part of the commit or push or whatever process
pre-commit provides a convenient way to organize running a collection of shell scripts.
Besides during commit, pre-commit/prek can run all hooks with `run`. So in CI/CD you can replace all discrete lint/format tool calls with one to pre-commit/prek. E.g. https://github.com/python/cpython/blob/main/.github/workflow....
This just seems like calling a shell script with extra steps.
I have a shell utility similar to make that CI/CD calls for each step (like for step build, run make build) that abstracts stuff. I'd have Prek call this tool, I guess, but then I don't get what benefit there is here.
Am I alone in that I never have had an issue with performance with pre-commit? granted I don't work on projects the size of the Linux kernel, but I haven't had any complaints.
I've used pre-commit very sparingly but it has happened and I also have no idea why this project need to exist? Why would pre-commit ever lead to performance problems? I get that the processes that are hooked in can be long running but the pre-commit itself? Why would it take any time at all?
Never had a problem. It adds negligible time to each commit and I have several hooks in use. Running tests takes several orders of magnitude more time.
Can people give examples of how they use pre-commit hooks that _cannot_ be replaced by a combination of the following?
* CI (I understand pre-commit shifts errors left)
* in editor/IDE live error callouts for stuff like type checking, and auto-formatting for things like "linters".
Do you run tests? How do you know _which_ tests to run, and not just run every test CI would run, which could be slow?
It’s a question of feedback time and consistency: e.g. if you run Prettier/Ruff in CI, someone has to wait minutes rather than milliseconds and you either have to fix build failures or grant your CI system commit privileges and deal with merge conflicts. This also means more total CI runner usage while someone’s laptop probably has 10 idle cores.
If it’s on a pull/merge request, you’re wasting reviewer time.
If the hook is blocking secrets, you can’t un-push it with 100% certainty so you have to revoke credentials.
For texts, I tend to have the equivalent of “pytest tests/unit/“ since those are fast and a good sanity check, especially for things like refactoring.
I also run our pre-commit checks in CI for consistency so we’re never relying on someone’s local environment (web editors exist) and to keep everyone honest about their environment.
> Can people give examples of how they use pre-commit hooks that _cannot_ be replaced by a combination of the following?
I can't, because the point of our pre-commit use isn't to run logic in hooks that can't be run otherwise.
e.g. We use pre-commit to enforce that our language's whitespace formatting has been applied. This has the same configuration in the IDE, but sometimes devs ignore IDE warnings or just open files in a text editor for a quick edit and don't see IDE warnings or w/e.
"Replaced by CI" isn't really meaningful in our context - pre-commit is just a tool that runs as part of CI - some things get done as pre-commit hooks because they're fast and it's a convenient place to put them. Devs are encouraged to also run pre-commit locally, but there's no enforcement of this.
> Do you run tests? How do you know _which_ tests to run, and not just run every test CI would run, which could be slow?
We have performance metrics for pre-commit hooks and pre-push hooks. I forget the exact numbers, but we want stuff to "feel" fast, so e.g. if you're rebasing something locally with a few dozen commits it should only take seconds. Pre-push hooks have a bit more latitude.
So if you are using multiple languages to have scripts that run off your pre-commit hook, this is like a package and language runtime management system for your pre-commit hook build system? Rather, I think this is a reimplementation of such a system in rust so it can be self contained and fast.
This is the kind of thing I see and I think to myself: is this solving a problem or is this solving a problem that the real problem created?
Why is your pre-commit so complicated that it needs all this? I wish I could say it could all be much simpler, but I’ve worked in big tech and the dynamics of large engineering workforces over time can make this sort of thing do more good than harm, but again I wonder if the real problem is very large engineering teams…
Another commenter is currently down voted for something similar, but I'll share my controversial take anyways: I hate pre-commit hooks.
I loathe UX flows where you get turned around. If I try to make a commit, it's because that I what I intend to do. I don't want to receive surprise errors. It's just more magic, more implicit behavior. Give me explicit tooling.
If you want to use pre-commit hooks, great! You do you. But don't force them on me, as so many projects do these days.
Client-side pre-commit hooks are there to help you in the same way that type checking (or a powerful compiler) is there to help you avoid bugs. In particular with git, you can skip the hooks when committing.
Now, if the server enforces checks on push, that's a project policy that should be respected.
The problem is that pre-commit hooks are much slower with a much higher false-positive rate than type checking.
Pre-commit checks should be opt-in with CI as the gate. It's useful to be able to commit code in a failing state.
I use exactly one such hook, and that's to add commit signoff because of a checklist-compliance item called DCO that fails all PRs unless they have the sign-off trailer added by `git commit -s`. I've long argued that we should be enforcing actual signed commits instead, but compliance has never been about doing the sensible thing.
It's as simple as a script with a cp command that I run after any clone of a repo that requires it; certainly doesn't require anything as elaborate as a hook manager.
Anyone using on very big projects can vouch for the speed of things?
It doesn’t seem like this solves the main issues with pre-commit hooks. They are broken by design. Just to name 2, they run during rebase and aren’t compatible with commits that leave unstaged files in your tree.
> they … aren’t compatible with commits that leave unstaged files in your tree.
It's a little surprising that git doesn't pass pre-commit hooks any information, like a list of which files were changed in the soon-to-be-made commit. git does so for pre-push, where it writes to a hook's stdin some information about the refs and remotes involved in the push.
I wonder if many pre-commit hooks, like the kind which run formatters, would be better off as `clean` filters, which run on files when they are staged. The filter mechanism makes it easier to apply just to the files which were changed. In the git docs, they even use a formatter (`indent`) as an example.
https://git-scm.com/book/ms/v2/Customizing-Git-Git-Attribute...
Not just faster than pre-commit, and totally compatible. Also with more features.
What difference does it make that it's written in Rust? Why is that so much a selling that it made it into the title?
To entice people who are fluent in said language, or those who are looking for something compiled and performant. If I see a project written in (java|type)script, I know to avoid it.
I don't understand. The whole point of pre-commit is it's a gateway to the operating system and also creating a ecosystem of pre integration continuous integration scripts. Scripts that are not rust.
This has been such a breath of fresh air. It was seamless to drop into my projects.
"...in Rust"
Is enough to don't even open the link! Everything right now seems to have an urgent need to be developed into Rust, like why???
Just like kubernetes, many companies followed the kubernetes hype even when it was not needed and added unnecessary complexity to a simple environment.
Now it is Rust time!!