Launch HN: Sweep (YC S23) – A bot to create simple PRs in your codebase

github.com

198 points by williamzeng0 2 years ago


Hi HN! We’re William and Kevin, cofounders of Sweep (https://sweep.dev/). Sweep is an open-source AI-powered junior developer. You describe a feature or bugfix in a GitHub issue and Sweep writes a pull request with code. You can see some examples here: https://docs.sweep.dev/examples.

Kevin and I met while working at Roblox. We talked to our friends who were junior developers and noticed a lot of them doing grunt work. We wanted to let them focus on important work. Copilot is great, but we realized some tasks could be completely offloaded to an AI (e.g. adding a banner to your webpage https://github.com/sweepai/landing-page/issues/225).

Sweep does this with a code search engine. We use code chunking, ranking, and formatting tricks to represent your codebase in a token-efficient manner for LLMs. You might have seen our blog on code chunking here: https://news.ycombinator.com/item?id=36948403.

We take these fetched code snippets and come up with a plan to write the PR. We found that having the LLM provide structured information using XML tags is very robust, as it’s easy for us to parse with regex, has good support for multi-line answers and is hard for the LLM to mess up.

This is because XML is common in the LLM’s training data (the internet / HTML), and the opening and closing tags rarely appear naturally in text and code, unlike the quotations, brackets, backticks and newlines used by JSON’s and markdown’s delimiters. Further, XML lets you skip the preamble (“This question has to do with xyz. Here is my answer:”) and handles multi-line answers like PR plans and code really well. For example, we ask the LLM for the new code in <new_code> tags and a final boolean answer by writing <answer>True</answer>.

We use this XML format to get the LLM to create a plan, generating a list of files to create and modify from the retrieved relevant files. We iterate through the file changes and edit/create the necessary files. Finally, we push the commits to GitHub and create the PR.

We’ve been using Sweep to handle small issues in Sweep’s own repo (it recently passed 100 commits). We’ve become well acquainted with its limitations. For example, Sweep sometimes leave unimplemented functions with just “# rest of code” since it runs on GPT-4, a model tuned for chatting. Other times, there’s minor syntax errors or undefined variables. This is why we spend the other half of our time building self-recovery methods for Sweep to fix and test its PRs.

First, we invite the developer to review and add comments to Sweep’s pull request. This helps to a point, but Sweep’s code sometimes wouldn’t lint. This is table stakes. It’s frustrating to have to tell the bot to “add an import here” or “this variable is undefined”. To make this better, we used GitHub Actions, which automatically runs the flow of “check the code → tell sweep → sweep fixes the code → check the code again”. We like this flow because you might already have GitHub Actions, and it’s fully configurable. Check out this blog to learn more https://docs.sweep.dev/blogs/giving-dev-tools.

So far, Sweep isn’t that fast, can’t handle massive problems yet, and doesn’t write hundreds of lines of code. We’re excited to work towards that. In the meantime, a lot of our users have been able to get useful results. For example, a user reported that an app was not working correctly on Windows, and Sweep wrote the PR at https://github.com/sweepai/sweep/pull/368/files, replacing all occurrences of "/tmp" with "tempfile.gettempdir()". Other examples include adding a validation function for Github branch name (https://github.com/sweepai/sweep/pull/461) and adding dynamically generated initials in the testimonials on our landing page (https://github.com/wwzeng1/landing-page/issues/28). For more examples, checkout https://docs.sweep.dev/examples.

Our focus is on finding ways that an AI dev can actually help and not just be a novelty. I think of my daily capacity to write good code as a stamina bar. There’s a fixed cost to opening an IDE, finding the right lines of code, and making changes. If you’re working on a big feature and have to context switch, the cost is higher. I’ve been leaving the small changes to Sweep, and my stamina bar stays full for longer.

Our repo is at https://github.com/sweepai/sweep, there’s a demo video at https://www.youtube.com/watch?v=WBVna_ow8vo, and you can install Sweep here: https://github.com/apps/sweep-ai. We currently have a freemium model, with 5 GPT-4 PRs at the free tier, 120 GPT-4 PRs at the paid tier and unlimited at the enterprise tier.

We’re far from our vision of a full AI software engineer, but we’re excited to work on it with the community feedback :). Looking forward to hearing any of your thoughts!

mellosouls - 2 years ago

Sweep is an open-source AI-powered junior developer.

I think you should be less bullish on the "open source". As you kindly clarified for me the other day when I asked [1], only the client is open source. The back end is closed (that's Sweep back end, not the LLM which is obvious) and the product as a whole cannot be self-hosted by third parties.

That's fine (though not clear what the benefit is to developers who might want to contribute), but at the moment the impression being given is that this is an open source product.

Of course, if I have misunderstood I'll be happy to be corrected.

I wish you the best with it as this seems like a very cool product even if its closed at core - but a lack of clarity now may undermine reception and goodwill later.

[1] https://news.ycombinator.com/item?id=36953720

gcanyon - 2 years ago

Your demo video https://www.youtube.com/watch?v=WBVna_ow8vo is ridiculously compelling. You need to make a better version of the video, and maybe a few more of them.

huijzer - 2 years ago

I think this makes sense. I've seen many situations of large software projects where some bug is just open for months or even years and actually very easy to fix. In hindsight then, it was then a lot of missed value if the bug just lingered around for no good reason. If there was some tool that could just run in the background and randomly pop up a PR from time to time, then that would be cool.

Good luck!

dottedmag - 2 years ago

CC-NC-SA is not an open-source license. Please do not use "open source" to describe your software in your marketing materials.

latortuga - 2 years ago

Interesting that your tagline is "spend less time writing, more time reviewing code" in the video. Developers already don't like reading the code, we even have a ubiquitous acronym for it. Writing is the fun part.

In my experience, junior developers become mid level developers by writing code, by practicing, by building small features, by doing grunt work. If they wanted to use an AI to do those tasks for them, I would tell them no - the whole point of having junior devs do simpler tasks is that's what level they're at. They don't get to the next level magically, it's by doing the work. If a high school football quarterback asked if he could skip practice and let his AI go to practice for him, I would wonder how he plans to get good at football.

I apologize that I don't have anything constructive to say here but you did ask for any of my thoughts.

jtmarmon - 2 years ago

Just merged my first simple PR with sweep. This is going to be so useful for the kind of things that would take 5 minutes to do but get procrastinated for weeks because you just can't find the time to context switch for it.

Congrats on the launch!

padolsey - 2 years ago

Love it!! The chunking stuff especially is really impressive. Hitting those token limits often is the annoying bit of working with LLMs.

A weird question: How do you feel about possibly ~wasted efforts of these techniques when gpt in a year or so is probably gonna be 100k+ in context length? I've felt this a bit. E.g. I really want to create a 'massive document' conversational agent but I'm doing around 90% of work just juggling and preempting token constraints with super hueristic indexing. I just feel it's all a bit.. wasted, in terms of effort. At some point the LLM apis (openai, claude, ..) will just accept massive zips of code and use them as entire prompts without need for these creative trickeries. Thoughts?

Oh! And have you tried out the function-calling APIs? I see you've found that XML is far more reliable as it's semantically enriched. I have found this to be the case as well, which is a shame because I really want the function-calling stuff to work equally well.

I'm loving stuff like this that starts to pseudo-expand the token limit.

guideamigo - 2 years ago

Wait for deluge of these PR generators to increase the commit count on GitHub.

santiagobasulto - 2 years ago

This is a very cool project, and definitively something we'll see more and more often for the next few years.

Now, a bit off topic. I've seen a proliferation of "open source" startups that just use "open source" as a way of promotion. The repo feels more like a landing page than a README. And the Installation shows anything BUT install instructions.

marktangotango - 2 years ago

Finally! My VP of engineering keeps saying I'm not making enough github commits, this is the solution I've been waiting for!

This is sarcasm, but I did have a VP who tracked commit frequency for a while. And people heard about it if they weren't commiting enough.

adr1an - 2 years ago

Great I think you made a good choice by interfacing directly to PRs. I'd like to see if I'm able to get my code coverage to 100% with this bot.

jhales - 2 years ago

What is your data privacy policy?

applgo443 - 2 years ago

How do you approach the problem of what files to look into to fix a bug? Just embeddings doesn't seem to cut it.

ryanSrich - 2 years ago

Sorry if I missed this, but do you plan on integrating with issues outside of github? For example, we use Linear, but it is connected to github to automatically pull PR information. It would be interesting to do basically the exact same thing as what you're doing, but do it with a Linear issue instead.

kfarr - 2 years ago

Excellent, this solves the #1 problem I've had with LLM development assist -- providing context of the existing application source when making new requests. Delivering the output via a PR is a nice touch. Already created 2 PRs. Still need a tiny bit of tweaking manually before I merge these, but definitely saved at least 30 mins. Here are the 2 PRs that it generated for others curious to see its capabilities: https://github.com/3DStreet/3dstreet/pull/324 https://github.com/3DStreet/3dstreet/pull/325

thatguymike - 2 years ago

Looks cool! I haven’t used it yet but this could be really helpful.

What’s your strategy around oss? Why wouldn’t I just clone and use your repo if it can talk directly to GPT4/Claude?

And, if I go through you, does Sweep retain my code in a DB or logs?

andrejcasey - 2 years ago

I've been building C++ applications professionally for several years now and can say that AI tools like this are just marketing hype. There are codebases with years of legacy cruft that most people have difficulty navigating; there's no chance that ChatGPT and derivatives like Sweep will solve real, nontrivial problems in them. I wish there was a solution for technical debt, but using AI to approximate solutions is a fundamentally flawed approach.

maccard - 2 years ago

This is awesome. Somewhat echoing the other feedback here, I'm a little concerned about the "self test" side of things, and would much rather sweep wrote tests. Given my experience with ChatGPT and copilot though, I feel reassured that it would write tests for me. If it wrote tests by default (or used a heuristic on when to write tests), I would find it impossible to not use this I think.

Great job.

elderlybanana - 2 years ago

Incredible work, this is the most exciting AI dev tool I've come across!

Do you have a strategy to supplement ChatGPT to handle post-2021 updates to languages and libraries? I tried it on a NextJS repo and it came up with something that looked like it would have been correct a few versions ago, but I had to make some manual changes. Certain fast-moving ecosystems might frequently have this issue.

juntao - 2 years ago

I'm wondering what will happen if we let ChatGPT review these PRs created by ChatGPT.

Yes, We made a small tool to help developer review their PR. Seems a great supplement for Sweep AI.

Build your own PR review bot in 3 minutes here: https://github.com/flows-network/github-pr-summary

pcthrowaway - 2 years ago

> Limitations of Sweep:

> - Using the latest APIs that have changed past 2022

Oh good, the frontend devs don't have to worry about AI taking their jobs yet.

applgo443 - 2 years ago

How is your experience with Modal?

And I'm curious to know more about your costs of deployment and running on Modal.

willsmith72 - 2 years ago

Is it possible to provide feedback to a PR? One of the best parts about these AIs is their ability to adapt based on feedback.

E.g. in the demo video, the code doesn't cover if splitName.length === 0. I would want to prompt it to cover that case as well

frays - 2 years ago

This looks fantastic. (Demo video was what sold me: https://www.youtube.com/watch?v=WBVna_ow8vo)

I will be keeping an eye on where this goes in the future.

kristiandupont - 2 years ago

I tried Sweep the other day and I got a weird error: [link redacted]-- "an issue has occurred around fetching the files." What files might that be?

In any case, congratulations on launching, your product looks really promising!

shrimpx - 2 years ago

Tightly related to https://second.dev, also a YC company in the previous batch. Though Second is specializing its AI developers to doing code migrations.

chrsig - 2 years ago

So next up, we'll be hiring people to put the bugs into the code base, so AI companies like this can stay in business, right? No bugs means no AI to fix bugs.

deathmonger5000 - 2 years ago

This is super cool!

_thisdot - 2 years ago

What is the Privacy Policy?

gcanyon - 2 years ago

Meta-issue: the purple on black text in the examples page is hard to read. https://docs.sweep.dev/examples