Ai on sbgrl.me

The end of tech-debt?

sylvain.bougerel@gmail.com (Sylvain Bougerel) — Sat, 16 May 2026 18:03:00 +0800

After Claude Code performed a refactor across our codebase in less than 24 hours and mostly autonomously, I started to wonder if we could end our tech-debt, and put a price on it?

Setting the stage
#

The codebase I’m referring to is that of our main Next.js application. It’s not a small codebase by any measure: 200KLOC, 1500 files, Typescript, Terraform and other languages.

Within this codebase, we use both API route handlers and server functions to fetch or mutate data. Each mechanism fits a specific purpose: an application should make its public API available via route handlers, while it should use server functions for internal mutations only.

In our application, however, we had leftover server functions used only to fetch data — without mutating it. This often resulted in the server function returning stale information, because it was doing something it was not designed for. We needed to migrate them to route handlers.

I had already fixed 4 of them, but I found 18 more that needed addressing.

This is not hard work, but not a dumb task either. Each case requires a new API route, a new contract, properly bounded inputs, validated outputs, updating how the client-side sends requests and handles responses, all while making sure it lints correctly and passes the ~2700 unit tests of our application — a perfect task for an agent.

Ralph to the rescue!
#

To make fixing the first 4 easier, I had already refined and tested a 150-line Claude skill. I knew it was time for a Ralph loop. So I set up the skill as the spec file, provided additional instructions around merge request (MR) submission and handling, and got it going in auto mode.

And it did. After 8 hours of work, it created 18 MRs for me to review, and continued to autonomously address the comments and fix any failures reported by the CI pipeline.

At the point where all 18 MRs were submitted and ready to merge, this task had cost roughly USD$180. I estimate that Claude went roughly 5-10 times faster than I would.

$180 is not a large amount of money for a specific, targeted refactor. We have dozens of those we can do to migrate and improve our codebase. Can we end tech-debt in our codebase?

Then reality caught up
#

After MR 6/18 merged into the main branch, our testing environment went down. It was not hard to notice: even a local test server would not run after rebasing. MR 6/18 contained a route that was not set up properly, and that could only be caught at runtime. It was an easy fix, and it reminded me that I had forgotten a step in my Ralph loop: I quickly instructed it to build comprehensive test cases for us to exhaustively verify the changes.

Testing is going to need improvement, but we knew this; we had already started to experiment with end-to-end tests to tackle this tech-debt. Here again, with great help from Claude, we managed to automate some end-to-end testing. However, in contrast to the current migration, none of the results so far have been satisfactory. This time, it seems our issues have more to do with the design of our application — and the solution isn’t yet clear.

The trouble with the process
#

But the testing environment going down was not actually the biggest friction — it was fixed quickly. The problem was that it took Claude 4 to 5 hours just to merge the remaining 12 MRs after MR 6. 8 hours of productive work, 4 hours of watching paint dry.

We use short-lived development branches against the origin/main trunk, and maintain a semi-linear history on merges (trunk-based development). This means that after each merge, the next merge request needs to be rebased, and the CI pipeline must run again. A rebase + CI cycle takes around 8 to 10 minutes today; and if the pipeline fails or a conflict occurs during rebase, additional CI cycles may need to run.

Every day, we race each other to merge our work. That day, we also raced the 18 MRs the agent had to merge. And our CI pipeline is not going to get faster — if anything it’s going to take more time once we add end-to-end tests.

So, can we end tech-debt?

In short, not all tech-debt scales equally well for agents, especially when the issue may have more to do with application design or the solution is not yet clear — as in our case with end-to-end testing.

And while we can certainly price some migration work, or even execute it concurrently with our feature work, it still needs to clear CI. Agents didn’t eliminate the implementation bottleneck — they moved it to the CI process.

I now find myself thinking about merge trains and other merging strategies to eliminate the CI bottleneck. I’m also wondering whether larger organisations are now experiencing the same pressure on their CI processes, and how this will shape the future of CI and trunk-based development.

Why I continue to review code

sylvain.bougerel@gmail.com (Sylvain Bougerel) — Fri, 01 May 2026 10:01:00 +0800

Given the increase in velocity from coding assistants such as Claude Code and Copilot, automated review had become a necessity. Since we rolled out Coderabbit on our project, almost everybody stopped reviewing code. It didn’t happen immediately, but organically.

As it turns out, Coderabbit is a formidable reviewer: with only a bit of context in the commit messages and a well-written AGENTS.md, Coderabbit will often be more thorough in reviews than most of my colleagues. And if that wasn’t enough, Coderabbit is almost always available—98.3% of the time in the last 30 days, not good for typical SLA targets but much better than colleagues—and a fraction of their cost.

Eventually everybody stopped reviewing code… Save for me.

For one thing, coding assistants can hallucinate blunders that neither humans—regrettably—nor Coderabbit will catch:

const NON_EDITABLE_BOOKING_STATUSES = new Set([
 'cancelled',
 'canceled',
 'declined',
 'withdrawn',
]);

I took that snippet from a merge request submitted yesterday. We do not have any declined or canceled statuses in our entire domain model—they simply don’t exist. I chose this example because it’s one of the simplest. The day before, a merge request had a complete re-implementation of a 200+ lines utility we already had. This happens all the time.

But most importantly, if the phone rings because of an issue on the system, I need to get on the call and fix the issue. When my work has potential to impact my personal life and my weekends with family, there’s no AI that I will trust enough. And every alert, whether it fires on the job or not, still costs the whole team velocity when we scramble for unplanned work.

Ultimately, our customers don’t care who wrote the code: “We’re sorry, Claude wrote that line and Coderabbit didn’t catch the problem” is not a good look. No matter how powerful our tools have become, responsibility is still mine. So I’ll keep on reviewing code.

AI exacerbated the divide between engineers

sylvain.bougerel@gmail.com (Sylvain Bougerel) — Sat, 25 Apr 2026 22:12:00 +0800

Any complex product requires deep context about the product’s domains, architecture, implementation. And if you want to make effective use of AI to solve a problem, you need to inject the right subset of that entire context into the prompt, or that smart auto-complete will spit out costly nonsense.

Naturally, engineers with the best understanding of the context, the capacity to articulate the problem and the skills to implement it are even stronger with AI: they can provide better input to the model and quickly validate its output. The once derided 10x engineer is starting to become a reality for anyone who can use AI to implement a correct solution at a fraction of the time it would have taken them.

On the other end of the spectrum, weaker engineers are now compelled to keep up with the pace set by their stronger peers, only they can neither prompt the LLM effectively nor validate what it produces. So they just let the garbage out—knowingly or not.

If you feel like you’ve become a “man-in-the-middle”—a proxy between somebody else’s request and an LLM—you will be phased out. The great engineers of today learned the ropes at a time AI didn’t exist, when they could invest in their fundamentals. Give yourself the same gift: carve out time to use less AI, and start learning again so you can close the divide.

Claude Code more than doubled my productivity

sylvain.bougerel@gmail.com (Sylvain Bougerel) — Sat, 14 Mar 2026 15:28:00 +0800

I wrote twice as much code last quarter, and it was better. I use Claude Code daily via Steve Molitor’s claude-code.el and monet.el integration for Emacs. Once a review is ready, CodeRabbit handles it. My daily AI usage really kicked into high gear back in October 2025, when we got a large amount of Claude Code credit grants and I could use it as much as I wanted.

The graph below is the number of lines of code changed (added and deleted) in my merge requests (MR), summed monthly, on our main application repository; a Next.js application with 200KLoC — not something an agent can easily get into. I authored 773 merge requests on this project alone which runs in production to serve our clients — just to emphasize the value of the sample-size:

You can see a clear 2x increase when I started to use Claude Code daily around October 2025. Like most folks, I found Claude Code to be amazing when working as a pair programmer to write the code or review my changes in short feedback loops, to explore greenfield projects or to strengthen tests on brownfield projects.

Around that time, I felt the quality of my code improving too. Below is the number of normalized comments $N$ in each of the MRs: or the number of comments $C$ divided by the sum of lines of code added $A$ and deleted $D$:

\begin{equation} N = \frac{C}{(A + D)} \end{equation}

Back in April 2025, we rolled out CodeRabbit which picked up over 70% of the reviewing work for everyone. Most team members do not perform code reviews anymore, except for myself — I’ll probably explain why in another post. Thus, on the changes I author, over 95% of the comments are just done by CodeRabbit from that time onwards.

Normalized comments is an interesting metric since it reflects the quality of my work with Claude Code, prior to review by others and CodeRabbit. If the value decreased there, it could be that:

We got lazy about reviewing, but CodeRabbit is pretty much the only reviewer of my work;
We output a lot more code or the quality of the code increased;
The work we do is a lot easier, so there’s just fewer issues with it.

AI certainly had an impact on point 2, given the top graph. You can also see a dip soon after October 2025 when I started to use Claude Code, but then a bump appears again in January 2026. That bump is due to the fact that I had to tackle a challenging issue that spanned several MRs. It’s quite visible when looking at the histogram of independent values:

There are other common proxies for code quality: such as unit tests. This is the raw number of unit tests on the main project branch, over time and it went up nearly 5x in the last 4 months:

Additionally, we have not had production rollbacks or outages in the last 4 months, consistent with the past year (We only ever had to perform production rollbacks twice). So agent usage is not called into question in our team.

The stats in this post were extracted using a tool written by Claude Code.

Yodarabbit

sylvain.bougerel@gmail.com (Sylvain Bougerel) — Sun, 11 Jan 2026 13:06:00 +0800

Coderabbit has this new feature that allows you to change its tone. I can’t wait to ask it to speak like Yoda.

# .coderabbit.yaml
tone_instructions: "Speak like Yoda, you must."