Everybody has a plan until they get punched in the mouth

It is rare for me to quote Mike Tyson, be he was onto something with this. Here is the thing with coding agents: you will not tell a coding agent to do something in one sentence and then something usable falls out. I mean… there is a chance - depending on your prompt. If you tell it to copy something that exists millions of times on GitHub. But anything even remotely more involved than that, things will be a bit more tricky, especially if you are dealing with an existing system and not one shot-ing a Tetris clone.

I have mostly one or two shot things with my coding agent, but that is also me giving it very detailed and specific instructions, resulting in only a handful lines of code changes. Something I can easily review without spending too much mental energy on it.

While working on LazerBunnys controller for my agents I asked myself a question: What if I want to join the cool kids for once and ignore everything I ever learned about engineering and irresponsibly yeet millions of lines of code into the world without any regard for quality, safety or functionality? Things get more complex.

Usually you talk to your agent and develop a plan what to do. Meaning a multi page specification, detailing not only the business reasons for the code to write, but also include implementation details like how to seed data, where to put things and what third parties services or APIs to use.

Back in my day we called this a product specification, something a project or product manager and potentially an architect provided. Damn kids and their little hallucination boxes. Get off my lawn!

Kidding aside for a second. Having my agent implement multiple things while I walk Triss seems like a pretty neat thing. And yet I will die on the hill that vibe coding is irresponsible from an engineering perspective. However, that does not mean we cannot use the techniques and technology in a responsible way.

Context

As we might know at this point, LLMs quality and information usage is not spread evenly across the whole context, but emphasizes the beginning and the end. So a million token context is amazing… except when key parts in the middle get lost, ignored or misrepresented. I think recent models got noticeably better, but it is still observable. And this doesn't even factor in the loss of information when compressing the history.

My goal is pretty clear. I want to have a list of things my LLM is supposed to work through, but with the least amount of shots and brand new sessions for each task. This matches the way I use LLMs (mostly in Q/A mode, not actively writing code) and has shown to result in the most satisfying probabilistic arrangements of words.

For the "plan" I am not using a single edit field. I want it to be a list of instructions. After each point in the list a new session is started. All previous steps are summarized as an additional system prompt. Each step is a separate git branch that can be merged in reverse order of the list until hitting main.

A plan called "charge customer stripe" looks something like this (list items indicated by ---):

Implement a new http handler for a customer to enter their credit card information and tokenize it using Stripe. Put the handler in src/web/stripe.go. Stripe credentials are stored in the environment as stripe_key.

---

Write unit tests for src/web/stripe.go

---

Add a new function in src/customer/billing.go to charge a customer using a Stripe token. Call this function from src/web/stipe.go

---

Send the customer an email that a transaction was successful when a Stripe charge succeeded in src/customer/billing.go. Use the email server configured in src/app.go, in App.email.

---

Write unit tests for src/customer/billing.go

---

Add a background job to src/jobs.go to periodically check if there are any stripe charges that failed

Far shorter and a lot less verbose than most of the plans I usually have to review. But each step on its own can be implemented as is, has clearly defined exit conditions and will not produce too much work to review.

Run LLM, run

The session for the last step will look something like this:

$ git checkout -b charge-customer-stripe-6
"system prompt"
"summary of previous steps"
"task as user message"

After all this I end up reviewing six branches, all with 50 to 100 lines of code, which hopefully all play nicely together when merging them. And if that is not the case it is a bit of manual intervention to make the things fit together. Something I do not mind. I had the LLM take over writing a few isolated pieces of code that would have taken me a bit longer. But once I put these together I at least know where the code is, can do light refactors that would have burned an obscene amount of tokens, and can ensure there are no glaring issues with the code that will leak all credit card information (aka: diligently doing my job).

For planning mode, the window will be split in two parts, one to create the list of tasks and one to chat with the LLM about the codebase, look up documentation and provide links to it for me to review and spot check if there are any problems in the though process of what to implement.

Error state

Sometimes things go wrong. The LLM gets stuck. Hallucinates. Generates 500.000 tokens and still does not know how to place a button in a HTML file. Been there, seen this, screamed at Claude "How hard can it be?".

The one thing I found works really well with this approach is going a branch back, or manually fixing the branch and letting the rest of the task list execute. With one long markdown file as input the LLM usually spends a good amount of time figuring out what to do next. With one shot branches there is no figuring things out. The code is in a state in which it can do exactly what it was told.

I am considering implementing a way to indicate a "pause" in the plan for me to manually do something. Most likely for code architecture and to ensure things are laid out in a way that makes sense and are maintainable (instead of cramming everything into random files that make no sense to have, or re-inventing functionality over and over again).

Progress

The LLM router is routing. I need to implement the priority queue, because it is annoying when talking to Endirillia and she does not answer right away because she is playing judge for a research task. I hope to get to it this week and press the publish button. The controller for the agents is coming along slowly. I was mostly playing with the UX to see what works best. The approach was pretty much decided on based on prior experience.

The avatars body is done. Well, I am debating adding a bit muscle definition here and there to not have the arms and legs look like cylinders. I am not sure if these definitions would ever be seen, but they might be good practice for sculpting. Next up is the hair, which seems to be more work than her face if the tutorial length is an indicator. I am slowly getting pretty comfortable with the two tools I know how to use. So I am expecting a step like texture painting or UV mapping to completely throw me off and make me question my ability to do things that are explained to be simple.

posted on April 26, 2026, 8:30 p.m. in AI, lazerbunny

I am perpetually a little bit annoyed by the state of software - projects constantly changing, being abandoned or adding features that make no sense for my use case - so I started writing small tools for myself which I use on a daily basis. And it has not only been fun, but also useful. For the rest of the year I will focus on a project I have been thinking about for a few years: Building a useful, personal AI assistant.