A case study in testing with 100+ Claude agents in parallel

imbue.com

55 points by thejash 2 days ago


npodbielski - 11 hours ago

If this will be future of software in 20 years nobody will understand what the hell software actually does. If nobody will things will get to implode quickly.

Yokohiii - 12 hours ago

> Finally, remember that mngr runs your agent in a tmux session

what the hell?

dakolli - 14 hours ago

this is a pitch to sell an agent orchestration product and services.

khazhoux - 11 hours ago

Me: has to babysit every feature for hours in Claude Code, building a good plan but then still iterating many many times over things that need to be fixed and tweaked until the feature can be called done.

Bloggers: Here's how we use 3,000 parallel agents to write, test, and ship a new feature to production every 17 minutes in an 8M-LOC codebase (all agent-generated!).

... I'm doing something wrong, or other people are doing something wrong?

maxbeech - 14 hours ago

the thing that actually burns token budget at scale isn't the agent count itself—it's understanding the cost model of orchestrating them. 100 agents running in parallel is fine if they're short-lived queries. but once you start running them on a schedule (hourly checks, overnight batch work), the math changes fast.

each agent run against a real codebase probably spends 20-50k tokens just on context: repo structure, relevant files, recent changes. multiply that by 100 agents running every hour across 10-20 repos, and you're already hitting millions of tokens a day before any actual work happens. add in re-runs for failures or retries, and the cost curve gets steep quickly.

the harder problem is observability. with one agent you can read logs and understand what went wrong. with 100 agents you need aggregation, pattern detection, alerting on the common failure modes. if 3 agents fail silently but identically, was that a real issue or just rate limiting? if 40 agents all timeout at the same step, was it a dependency problem or infrastructure saturation? at scale you're debugging distributions, not individual runs.

also helps to be ruthless about concurrency. the async pattern isn't "run as many as possible at once"—it's "run exactly as many as the API and your budget can support without making the failure modes harder to diagnose." for claude api work that's usually smaller than people expect.

meidad_g - 12 hours ago

[dead]

petcat - 15 hours ago

Curious how people and companies like this are approaching matters of intellectual property now that the courts have ruled that basically no part of AI generated content or code is copyrightable and is therefore impossible to claim ownership of.

Are people just not going to open source anything anymore since licenses don't matter? Might as well just keep the code secret, right?