Orchestrating AI code review at scale

blog.cloudflare.com

63 points by pramodbiligiri 3 days ago


suika - 6 minutes ago

As a solo dev or rather nowadays more so only a decision maker / agent overseer, I came to enjoy letting my agents develop against a Gerrit repository / workflow. Dev agent pushes a CL, review agent picks it up (not just the diff, but the full repo), runs tests/reviews/review-subagents and concludes by posting a review as well as a vote. This goes back and forth with new patch sets / replies to the threads. Eventually the CL gets a +2 or whatever and I have the final call to manually submit it. It is way slower compared to just pushing through development with one agent doing everything yolo against a normal repository, but it seems to me that the additional time is well spent (no, I don't have fancy graphs or similar analysis to prove this other than my gut feeling after looking at recent development results).

thih9 - 3 hours ago

> Today, when an engineer at Cloudflare opens a merge request, it gets an initial pass from a coordinated smörgåsbord of AI agents.

I’d prefer to have that happen as some sort of pre commit hook, before a merge request is sent. The feedback loop might be a bit faster and the process might produce less noise this way.

plmpsu - 4 hours ago

I built a more naive version for our team using Copilot and GitHub actions and it works quite well (wish I had metrics too). The team loves it.

The ROI here is so high that I don't mind using the strongest model available for the actual code review. I don't trust Sonnet and such. Just let Opus or GPT 5.5 do the whole thing and pay a bit more for less complexity.

rzmmm - 4 hours ago

> The entire system also runs locally.

I think approaches like this don't need to run other than locally. Maybe integrated as pre-push hook. The system is nondeterministic, so it's at odds with the purpose of CI.

bob1029 - 4 hours ago

> One of the operational headaches we didn’t predict was that large, advanced models like Claude Opus 4.7 or GPT-5.4 can sometimes spend quite a while thinking through a problem, and to our users this can make it look exactly like a hung job.

I had the same problem in my recursive agent harness. It would always come back, but it could sometimes take up to 10 minutes depending. I fixed this by adding a required "purpose" argument to every tool and call/return event. As the recursive evaluation proceeds, every single thing that happens streams incremental purpose text to the user's browser (also using the magic of JSONL for this). The incremental progress events contain the purpose and a detail section (tool arg JSON) that the user can expand/collapse.

jmakov - 2 hours ago

Every iteration something can be found. How many times do you iterate e.g. on performance - use optimized struct, oh, you can change the architecture etc.? At that point one can just have a while loop for the agents to make changes until no comments left.

etothet - 3 hours ago

What’s the over/under on when Cloudflare will acquire OpenCode (and keep it open source)?

faangguyindia - 3 hours ago

what's best workflow for solo devs?

spprashant - 3 hours ago

[dead]