Have AI resolve your merge/rebase conflicts

2025-01-26 by Josh Bleecher Snyder

TL;DR: merde.ai can resolve your conflicts for you. It's not perfect, but it can help. Try it!

Automation is at its best when doing jobs that are tedious, dangerous, or awful.

Handling git conflicts is awful. And tedious. So we set out to automate it.

Harder than it looks

This sounds like a perfect problem for an LLM. Text in, text out, plenty of context and structure.

We started with the obvious approach: Attempt the merge. (Or the rebase. I’ll just say merge from here on.) Then feed the conflicts to an LLM and ask it to resolve them.

This…doesn’t work very well. On a benchmark we built ourselves from real world conflicts, it was scoring single digits.

This is unsurprising. When I’m manually resolving a conflict, I don’t just look at the conflicted area. I go digging. So we taught the LLM to dig, through prompt engineering and plain ol’ engineering.

It took a while, but we’re pushing fifty-fifty on the benchmark now, with more tricks waiting in the wings. (If solving merge conflicts so others don’t have to sounds like heaps of fun, and you’re in the Bay Area, let’s talk.) And more importantly, when it doesn’t work, the conflict resolution mostly fails rather than producing merges that are worse than a human would.

Hiding in here is a lesson. LLMs are remarkably powerful, but they take real work to integrate. Getting good results from them requires engineering effort and investment.

An aside for folks who like geeking out about specific LLMs. At least for the way we’re approaching the problem, merge quality turns out to be heavily dependent on model quality. Claude and DeepSeek V3 are roughly tied (we use Claude), followed by 4o. The “slow thinker” models like o1 and R1 don’t provide any improvement. After 4o, there’s a big gap, and then the rest of the field.

When trying smaller models we ran into a familiar phenomenon: smaller models have quirks that greatly increase the effort it takes to use them. For example, Llama 3.3 70b was worse than Llama 3.1 70b because it has a tendency to add whitespace at the end of lines of code. This might be fine when generating code for an editor that might then strip that whitespace off; it’s less fine when you are attempting to diligently and faithfully merge code in which trailing whitespace might be meaningful. (Programming languages with multi-line string literals can have significant whitespace at the end of a line.) The effort involved in working around this model weakness is significant. A quick and dirty regex cleanup would superficially improve benchmark performance. But when that same regex is applied to a better model that doesn’t play fast and loose with whitespace, it will do nothing but hurt. To borrow a line from Bertrand Russell: benchmark hacking has all the benefits of theft over honest toil. Better not to play that game.

Considered dangerous?

Git’s standard line is that humans should do conflict resolution whenever there is any doubt. Tools should not be clever, as that might hide potential issues. Put succinctly by the Zen of Python:

> In the face of ambiguity, refuse the temptation to guess.

This is a harm reduction strategy. (Also a CYA strategy.) AI-resolved conflicts will contain mistakes. Some of those mistakes will be bad. Have we just traded some tedious and awful for an extra helping of dangerous?

Any time you’re considering a policy, it’s worth having a clear-eyed view of the alternatives. The alternatives aren’t “use AI” vs “don’t use AI”. The alternatives are “use AI, which a human can review” vs “leave it all to the humans”.

In practice, many humans do a mediocre job of conflict resolution. It is tedious and awful! Furthermore, many humans faced with a merge conflict choose to bail instead. People manually re-create branches or abandon work. It would be less effort to review and test the autogenerated resolution.

And of course, using the tool is optional. All engineering is about trade-offs. Now that we have high quality LLMs, and a way to apply them to the task, that will sometimes be the best choice. (And of course, there are the standard arguments about specialization of labor: We can fine tune the git-LLM bus to a level that copy/paste into a browser window will never reach.)

But also, yes. Be responsible. Don’t just use what the tool spits out, the same as when using AI codegen and when copy/pasting from Stack Overflow.

Design considerations

Working in git often feels like catching a boomerang with your teeth. (And to be clear, I like git.)

We wanted trying merde.ai to be low stakes, so you could just play with it, without fear.

Unlike git merge and git rebase, which do things to your working directory, merde merge and merde rebase create new branches for you, without touching your working directory or current branch at all. If you like the result, you can switch to that branch, rename it something useful, and start using it. Your old branch will still be there. If you don’t like the result, delete the new branch, and it is as if nothing ever happened.

So if you are attempting a merge, and you hit a conflict, you don’t ask merde to take over. Instead, you bail on the merge, or finish it poorly, or do whatever you like. And then you ask merde to take a crack at it from scratch. And then you assess the results, modify them, use them, or throw them away. Let the boomerang fall where it may.

Try it out

Head over to merde.ai to create an account, grab a token, and download the client. When you invoke it, it’ll do some local analysis, collate just enough of your repository for the server to do its work (roughly correlated to the size of the merge, but that’s a topic for another blog post), ship it off to the server, wait, and create a local branch with the results.

There’s also a far less powerful GitHub bot called @merde-bot. If you have an open public PR that has conflicts, write @merde-bot in a comment in the PR, and it’ll (probably) open a PR on your PR that resolves those conflicts. You can then review that PR and merge it as desired. This does make for a somewhat ungainly git history…but if you don’t care, and you just want the darn thing to merge, ask the bot. (And then be patient. It can take tens of minutes, depending on system load.)

Feedback

This is all pretty new. We’d love feedback, via a GitHub issue or over email. Tell us what works, tell us what doesn’t, tell us your hopes and dreams, tell us you want us to take your money.

sketch.dev · merde.ai · pi.dev