Claude Code's new hidden feature: Swarms

329 points by AffableSpatula 13 hours ago

https://xcancel.com/NicerInPerson/status/2014989679796347375

https://github.com/mikekelly/claude-sneakpeek

Ok it might sound crazy but I actually got the best quality of code (completely ignoring that the cost is likely 10x more) by having a full “project team” using opencode with multiple sub agents which are all managed by a single Opus instance. I gave them the task to port a legacy Java server to C# .NET 10. 9 agents, 7-stage Kanban with isolated Git Worktrees.

Manager (Claude Opus 4.5): Global event loop that wakes up specific agents based on folder (Kanban) state.

Product Owner (Claude Opus 4.5): Strategy. Cuts scope creep

Scrum Master (Opus 4.5): Prioritizes backlog and assigns tickets to technical agents.

Architect (Sonnet 4.5): Design only. Writes specs/interfaces, never implementation.

Archaeologist (Grok-Free): Lazy-loaded. Only reads legacy Java decompilation when Architect hits a doc gap.

CAB (Opus 4.5): The Bouncer. Rejects features at Design phase (Gate 1) and Code phase (Gate 2).

Dev Pair (Sonnet 4.5 + Haiku 4.5): AD-TDD loop. Junior (Haiku) writes failing NUnit tests; Senior (Sonnet) fixes them.

Librarian (Gemini 2.5): Maintains "As-Built" docs and triggers sprint retrospectives.

You might ask yourself the question “isn’t this extremely unnecessary?” and the answer is most likely _yes_. But I never had this much fun watching AI agents at work (especially when CAB rejects implementations). This was an early version of the process that the AI agents are following (I didn’t update it since it was only for me anyway): https://imgur.com/a/rdEBU5I

alphazard - 2 hours ago

Every time I read something like this, it strikes me as an attempt to convince people that various people-management memes are still going to be relevant moving forward. Or even that they currently work when used on humans today. The reality is these roles don't even work in human organizations today. Classic "job_description == bottom_of_funnel_competency" fallacy.
If they make the LLMs more productive, it is probably explained by a less complicated phenomenon that has nothing to do with the names of the roles, or their descriptions. Adversarial techniques work well for ensuring quality, parallelism is obviously useful, important decisions should be made by stronger models, and using the weakest model for the job helps keep costs down.
- simondotau - 4 minutes ago
  
  I suppose it’s could end up being an LLM variant of Conway’s Law.
  “Organizations are constrained to produce designs which are copies of the communication structures of these organizations.”
  https://en.wikipedia.org/wiki/Conway%27s_law
- rlayton2 - 2 hours ago
  
  My understanding is that the main reason splitting up work is effective is context management.
  For instance, if an agent only has to be concerned with one task, its context can be massively reduced. Further, the next agent can just be told the outcome, it also has reduced context load, because it doesn't need to do the inner workings, just know what the result is.
  For instance, a security testing agent just needs to review code against a set of security rules, and then list the problems. The next agent then just gets a list of problems to fix, without needing a full history of working it out.
  - purplepatrick - an hour ago
    
    I’ve found that task isolation, rather than preserving your current session’s context budget, is where subagents shine.
    In other words, when I have a task that specifically should not have project context, then subagents are great. Claude will also summon these “swarms” for the same reason. For example, you can ask it to analyze a specific issue from multiple relevant POVs, and it will create multiple specialized agents.
    However, without fail, I’ve found that creating a subagent for a task that requires project context will result in worse outcomes than using “main CC”, because the sub simply doesn’t receive enough context.
  - fphhotchips - 2 hours ago
    
    Which, ultimately, is not such a big difference to the reason we split up work for humans, either. Human job specialization is just context management over the course of 30 years.
- ttoinou - 40 minutes ago
  
  Developers do want managers actually, to simplify their daily lives. Otherwise they would self manage themselves better and keep more of the share of revenues for them
_alex_ - 9 minutes ago

Interesting that your impl agents are not opus. I guess having the more rigorous spec pipeline helps scope it to something sonnet can knock out.
tehlike - 5 minutes ago

You probably implemented gastown.
AlexErrant - an hour ago

For those ignorant, CAB is Change-advisory board
https://en.wikipedia.org/wiki/Change-advisory_board
DanOpcode - 2 hours ago

Very cool! A couple of questions:
1. Are you using a Claude Code subscription? Or are you using the Claude API? I'm a bit scared to use the subscription in OpenCode due to Anthropic's ToS change.
2. How did you choose what models to use in the different agents? Do you believe or know they are better for certain tasks?
juanre - 5 hours ago

I have been using a simpler version of this pattern, with a coordinator and several more or less specialized agents (eg, backend, frontend, db expert). It really works, but I think that the key is the coordinator. It decreases my cognitive load, and generally manages to keep track of what everyone is doing.
kaspermarstal - 4 hours ago

Can you share technical details please? How is this implemented? Is it pure prompt-based, plugins, or do you have like script that repeatedly calls the agents? Where does the kanban live?
- mogili1 - 3 hours ago
  
  Not the OP, but this is how I manage my coding agent loops:
  I built a drag and drop UI tool that sets up a sequence of agent steps (Claude code or codex) and have created different workflows based on the task. I'll kick them off and monitor.
  Here's the tool I built for myself for this: https://github.com/smogili1/circuit
RestartKernel - 2 hours ago

What are the costs looking like to run this? I wonder whether you would be able to use this approach within a mixture-of-experts model trained end-to-end in ensemble. That might take out some guesswork insofar the roles go.
taspeotis - 3 hours ago

This sounds like BMAD?
https://github.com/bmad-code-org/BMAD-METHOD
ceroxylon - an hour ago

What are you building with the code you are generating?
ggoo - 6 hours ago

Is this satire?
- mafriese - 6 hours ago
  
  Nope it isn’t. I did it as a joke initially (I also had a version where every 2 stories there was a meeting and if a someone underperformed it would get fired). I think there are multiple reasons why it actually works so well:
  - I built a system where context (+ the current state + goal) is properly structured and coding agents only get the information they actually need and nothing more. You wouldn’t let your product manager develop your backend and I gave the backend dev only do the things it is supposed to and nothing more. If an agent crashes (or quota limits are reached), the agents can continue exactly where the other agents left off.
  - Agents are ”fighting against” each other to some extend? The Architect tries to design while the CAB tries to reject.
  - Granular control. I wouldn’t call “the manager” _a deterministic state machine that is calling probabilistic functions_ but that’s to some extent what it is? The manager has clearly defined tasks (like “if file is in 01_design —> Call Architect)
  Here’s one example of an agent log after a feature has been implemented from one of the older codebases: https://pastebin.com/7ySJL5Rg
  - ggoo - 6 hours ago
    
    Thanks for clarifying - I think some of the wording was throwing me off. What a wild time we are in!
  - stavros - 6 hours ago
    
    What OpenCode primitive did you use to implement this? I'd quite like a "senior" Opus agent that lays out a plan, a "junior" Sonnet that does the work, and a senior Opus reviewer to check that it agrees with the plan.
    
    mafriese - 6 hours ago
    
    You can define the tools that agents are allowed to use in the opencode.json (also works for MCP tools I think). Here’s my config: https://pastebin.com/PkaYAfsn
    The models can call each other if you reference them using @username.
    This is the .md file for the manager : https://pastebin.com/vcf5sVfz
    I hope that helped!
    
    stavros - 5 hours ago
    
    This is excellent, thank you. I came up with half of this while waiting for this reply, but the extra pointers about mentioning with @ and the {file} syntax really helps, thanks again!
  - overfeed - 5 hours ago
    
    > [...]coding agents only get the information they actually need and nothing more
    Extrapolating from this concept led me to a hot-take I haven't had time to blog about: Agentic AI will revive the popularity of microservices. Mostly due to the deleterious effect of context size on agent performance.
    
    throwup238 - 4 hours ago
    
    Why would they revive the popularity of microservices? They can just as well be used to enforce strict module boundaries within a modular monolith keeping the codebase coherent without splitting off microservices.
  - simultsop - 6 hours ago
    
    quite a storyteller
  - nobody_r_knows - 4 hours ago
    
    I'm confused when you say you have a manager, scrum master, archetech, all supposdely sharing the same memory, do each of those "employees" "know" what they are? And if so, based on what are their identities defined? Prompts? Or something more. Or am I just too dumb to understand / swimming against the current here. Either way, it sounds amazing!
    
    Jimmc414 - 4 hours ago
    
    Their roles are defined by prompts. Only memory are shared files and the conversation history that’s looped back to stateless API calls to an LLM.
- GoatInGrey - 6 hours ago
  
  It's not satire but I see where you're coming from.
  Applying distributed human team concepts to a porting task squeezes extra performance from LLMs much further up the diminishing returns curve. That matters because porting projects are actually well-suited for autonomous agents: existing code provides context, objective criteria catch more LLM-grade bugs than greenfield work, and established unit tests offer clear targets.
  I guess what I'm trying to say is that the setup seems absurd because it is. Though it also carries real utility for this specific use case. Apply the same approach to running a startup or writing a paid service from scratch and you'd get very different results.
  - vidarh - 4 hours ago
    
    I don't know about something this complex, but right this moment I have something similar running in Claude Code in another window, and it is very helpful even with a much simpler setup:
    If you have these agents do everything at the "top level" they lose track. The moment you introduce sub-agents, you can have the top level run in a tight loop of "tell agent X to do the next task; tell agent Y to review the work; repeat" or similar (add as many agents as makes sense), and it will take a long time to fill up the context. The agents get fresh context, and you get to manage explicitly what information is allowed to flow between them. It also tends to mean it is a lot easier to introduce quality gates - eg. your testing agent and your code review agent etc. will not decide they can skip testing because they "know" they implemented things correctly, because there is no memory of that in their context.
    Sometimes too much knowledge is a bad thing.
- SkyPuncher - 3 hours ago
  
  Doubt it. I use a similar setup from time to time.
  You need to have different skills at different times. This type of setup helps break those skills out.
- hereme888 - 6 hours ago
  
  why would it be? It's a creative setup.
  - ggoo - 6 hours ago
    
    I just actually can't tell, it reads like satire to me.
    
    blibble - 5 hours ago
    
    to me, it reads like mental illness
    
    mafriese - 5 hours ago
    
    maybe it's a mix of both :)
    
    PradeetPatel - 6 hours ago
    
    Why would it be satire? I thought that's a pretty stranded Agentic workflows.
    My current workplace follows a similar workflow. We have a repository full of agent.md files for different roles and associated personas.
    E.g. For project managers, you might have a feature focused one, a delivery driven one, and one that aims to minimise scope/technology creep.
    
    ionwake - 6 hours ago
    
    I mean no offence to anyone but whenever new tech progresses rapidly it usually catches most unaware, who tend to ridicule or feel the concepts are sourced from it.
    
    blibble - 5 hours ago
    
    yeah, nfts, metaverse, all great advances
    same people pushing this crap
    
    jitl - 4 hours ago
    
    ai is actually useful tho. idk about this level of abstraction but the more basic delegation to one little guy in the terminal gives me a lot of extra time
    
    ggregoryarms - 3 hours ago
    
    Maybe that's because you're not using your time well in the first place
    
    ionwake - 4 hours ago
    
    bro im using ai swarms, have you even tried them?
    
    blibble - 4 hours ago
    
    bro wanna buy some monkey jpegs?
    100% genuine