Engineering · Tooling · AI

Two years of AI-assisted coding: the parts that actually changed my workflow

The honest version. Where the model genuinely earns its keep, where it actively slows me down, and the loop I settled on after shipping InstaEscrow, SafeBoda features, and this entire portfolio with one of them in another window.

28 April 2026·9 min read

I've had Claude (and before that, GPT-4, Cursor, Copilot in roughly that order) open in a side window for nearly every line of non-trivial code I've written since 2024. That includes InstaEscrow (a solo-built payments product running real M-Pesa flows), features at SafeBoda touching live dispatch, and the site you're reading this on. The site itself was scaffolded, styled, deployed, and write-tested in one afternoon by a Claude agent operating with shell access on my server.

That's not a flex. It's the prerequisite for an honest opinion. Most of the takes I see online about “AI coding” are written by people who've either never shipped real code with one or who tried it briefly in 2023 and gave up. This essay is what I actually believe after two years of production use.

What it's genuinely transformed

1. Greenfield scaffolding

The first 70% of any new service (repo layout, test harness, CI config, basic domain model, CRUD endpoints, deploy scripts) used to take a day or two of careful copy-pasting from previous projects. Now it takes an hour. Not because the model writes magic code, but because the cognitive load of typing it all out is gone. I describe what I want, watch what comes out, point at the parts that need adjusting.

The trade is that you must actually readwhat comes out. I've seen the failure mode where engineers paste in a generated scaffold, run the tests, see green, and move on. Three weeks later they're puzzled why their auth middleware silently allows unauthenticated requests in development. Read every line.

2. Mechanical refactors

Renaming a concept across forty files. Switching from one logging library to another. Migrating a Phoenix module from Ecto v2 syntax to v3. Converting an Express app to NestJS controllers. These are boring, error-prone, and take half a day done by hand. They take twenty minutes with a model that can read the codebase and apply the change consistently.

The unlock is feeding the model a clear “before / after” on one or two examples and letting it grind through the rest. Watch the diff. Run the tests. Done.

3. Documentation and copy

Every README, OpenAPI spec, internal runbook, error message, commit message, PR description. The model is consistently better at this than I am because I'm tired of writing docs and it isn't. It's not better than I am; it's more willing than I am.

4. Debugging unfamiliar errors

A weird Postgres error. An obscure Erlang stack trace. A Webpack config that mysteriously broke. Pasting these into a chat with relevant context and asking “what does this mean and how do I fix it” is faster than Stack Overflow and Google for any problem that's already in the model's training. Which is most problems, because most problems aren't novel.

5. The first draft of a test

Writing the failing test before the implementation is good discipline that I, like most engineers, slack on. Models do not slack on this. Asking for “the failing test that asserts this behavior” makes TDD nearly free. The test it writes is usually 80% of what I want; the last 20% I tighten by hand.

What it's genuinely not good at

1. Novel architectural decisions

“Should I use Redis Streams or Phoenix PubSub for cross-service eventing” (yes, I asked). The model gives you a balanced, well-organized answer that lists trade-offs. Useful as a checklist. Useless as a decision. Architecture is about the constraints in yoursystem that the model doesn't know about: your team size, your operational maturity, your migration path, your political appetite for adding a new dependency. Models don't substitute for taste because they don't have your context.

I treat the model's architectural advice as a checklist I run my own thinking against, not a recommendation I follow.

2. Performance tuning the last 20%

Models are consistent at applying generic performance advice: add an index, use a connection pool, batch writes. They're bad at the actual production-tuning work that matters: profiling a real workload, finding the one query in N that takes 90% of the wall clock, deciding whether the query plan is wrong because of skewed statistics or because of a parameter sniffing problem. The debugging loop here is empirical, and the model can't see your production traces.

3. Security-sensitive code

Authentication. Authorization. Cryptographic primitives. Webhook verification. Models will happily write you a JWT verification function that looksright and is subtly broken in a way that won't fail any test you'd normally write. I write this code by hand, run it past the model as a reviewer rather than an author, and add property-based tests that try to break my own assumptions.

4. Anything where the cost of subtly-wrong is high

Money handling. Schema migrations on hot tables. Cron schedules. Index-creating queries on production. The model writes these plausibly. Plausibly wrong is worse than visibly wrong, because plausible passes review.

The workflow that fell out

After two years I've converged on a fairly stable loop. It looks like this:

Brainstorm with the model. Describe the problem. Ask for two or three approaches. Read all of them. Pick one knowing why I rejected the others. The model is at its strongest here: broad knowledge, fast, no ego.
Write the design doc myself.Once I know the shape, I write the spec. The model can edit drafts but the original thinking is mine because I'm the one with the system context.
Decompose into tasks. I sketch the file tree and the work units. Sometimes the model helps me find pieces I missed.
Implement task-by-task with TDD. I write the failing test (or ask for one and edit it). The model writes the minimal implementation. I run tests. I read the diff. I commit. Repeat.
Review my own work like it was a stranger's. Especially the security-sensitive parts and the parts where the cost of being wrong is high.
Ship.

This isn't exotic. It's the same loop I was already using when pair-programming with humans, just with a partner that's fast, never tired, and incapable of saying “I don't know.” That last bit is the dangerous one. You have to supply the “I don't know” yourself, on its behalf, when the situation calls for it.

On agents

The frontier of useful AI in engineering right now is agents: models that can run their own loops, execute commands, read files, modify code, deploy. I use Claude Code daily for tasks like “upgrade this dependency, run the tests, fix what breaks, make a PR.” That kind of mechanical bounded work is where agents shine.

Where they fall down is anything that needs judgment about ambiguous requirements. The agent will pick a direction; the direction will be plausible; the direction will sometimes be the wrong one because the agent didn't know which constraint mattered most. The fix is to spec rigorously before dispatching. Specs are now the highest-leverage thing I write. The clearer the spec, the better the agent does. The vaguer the spec, the more creative interpretation costs me.

This entire portfolio site, including the article you're reading, was built end-to-end by a Claude agent given a written spec, shell access to my server, and the autonomy to push, deploy, and verify. The spec was about 2000 words long. The build, deploy, and a Phase 1.5 follow-up took a single working session each. The thing that mattered wasn't the model. It was the spec.

The bigger shift

The thing nobody talks about: the model has changed which problems are worth doing at all. Pre-LLM, I would have shrugged at “build a portfolio with four interactive live demos backed by a real Phoenix service”: too much upkeep, too much yak-shaving for too little return. With agent execution, that calculus inverts. The cost-of-effort drops by an order of magnitude, and side projects that would have died at “maybe one day” ship in a weekend.

The skill that matters now is knowing which projects are worth bringing back from your dead-projects list. Which is a different skill from coding, and a much better one to have in 2026 than the one I was hired for in 2018.

The takeaway, if you're curious

Use the tools. Don't worship them. Read every diff. Trust the scaffolding, doubt the architecture, verify the security. Spec before you dispatch. Treat the agent as a fast intern with broad knowledge and no judgment. The intern still works under your signature.