The Uncomfortable Truth: AI Now Writes Better Production Code

I didn’t want to arrive at this conclusion.

For most of 2025, even as our workflow changed and our output improved, I resisted saying it out loud. It felt irresponsible. Naive. Like falling for hype.

But by the end of the year, after months of watching real systems evolve under agentic workflows, the pattern became impossible to ignore.

The code we were shipping was better than what we used to ship.

Not faster. Not cheaper.

Better.

This wasn’t a single moment of revelation. It crept up slowly, and that’s probably why it took so long to accept.

This Isn’t About Intelligence

When people hear statements like this, the immediate reaction is usually to argue about intelligence.

Is the model really smarter than an experienced engineer?

That turns out to be the wrong question.

What mattered in practice wasn’t raw intelligence or creativity. It was consistency.

AI doesn’t get tired. It doesn’t lose focus late in the afternoon. It doesn’t rush a change because it wants to clear its task list before the weekend. It doesn’t forget why a decision was made three months ago.

Humans do.

Even very good ones.

What we started to see in the second half of 2025 was that, given the right constraints, agents applied pressure to a codebase in a way humans simply can’t sustain over long periods of time.

The Long Time Horizon Problem

There’s a category of work that humans are particularly bad at: tasks with long time horizons.

These are projects where:

The payoff is months away
Decisions compound slowly
Quality emerges through iteration rather than brilliance

Traditionally, this is where teams struggle. Context gets lost. Standards drift. Short-term fixes accumulate.

This is exactly where agentic systems excel.

Once we stopped treating AI as something that needed constant hand-holding, and instead embedded it inside a workflow with memory, structure, and review, it became clear that these systems were far more reliable over time than we are.

The “Genius, But Childlike” Phase

Working with modern models is a strange experience.

On one hand, they have an absurd breadth of knowledge. I’ve given agents problems that would have taken me days or weeks to work through, only to see them solved correctly in a matter of hours.

On the other hand, they still make mistakes that feel almost naive.

The best way I’ve found to describe it is this: you’re mentoring a genius with the temperament of a child.

It needs direction. It needs boundaries. It needs feedback.

But once those are in place, it can operate with a level of persistence and discipline that no human team can maintain indefinitely.

Why Process Beat Talent

Early experiments were rough.

The code worked, but it was inconsistent and hard to maintain. Reviews were painful. Fixing issues sometimes took longer than writing the code ourselves.

The turning point wasn’t a better model. It was better process.

Guardrails changed everything.

Clear project context. Shared internal libraries. Explicit architectural constraints. Automated reviews. A strict rule that everything goes through Git and is reviewable.

Once those were in place, something interesting happened: the quality curve flipped.

Instead of agents introducing entropy into the system, they started removing it.

They caught things we missed. They enforced consistency we would have hand-waved. They refactored code we would have been too busy to revisit.

This Is Why Tools Matter

One thing that became obvious very quickly is that not all AI tools are interchangeable.

Benchmarks don’t tell the full story.

What mattered for us wasn’t how a model performed on isolated tasks, but how it behaved over weeks and months inside a real codebase.

For serious software engineering work, we found that Claude and the surrounding ecosystem were meaningfully ahead. Not because of flashy output, but because of persistence, context retention, and the ability to operate inside tight execution loops without degrading.

That difference compounds over time.

The Team Size Collapse

This shift had consequences we hadn’t fully anticipated.

At the start of 2025, we expected to grow the team.

We didn’t.

We simply didn’t need to.

With agentic workflows in place, two people were comfortably doing the work that would previously have required a team of six or seven.

This isn’t a theoretical future scenario. It’s already happening.

And it raises uncomfortable questions about how our industry is structured.

Humans Still Matter — Just Differently

None of this means humans are obsolete.

In fact, the human role became more important, not less.

But it shifted.

Humans are no longer primarily valued for how fast they can type or how much syntax they remember. They’re valued for judgment, taste, direction, and the ability to design systems that don’t collapse under their own complexity.

Agents do the work.

Humans decide what the work should be.

There’s No Going Back

Sometime late in 2025, it became clear that this wasn’t a temporary phase or a productivity hack.

It was a structural change.

Once you’ve seen what small teams can do with well-designed agentic workflows, going back to purely human-driven development feels like willful inefficiency.

The genie really is out of the bottle.

The question now isn’t whether this shift happens.

It’s who adapts early, who designs the guardrails, and who ends up trying to catch up later.

For us, this wasn’t a choice.

It was the only path that still made sense.

‍

The Uncomfortable Truth: AI Now Writes Better Production Code

The Uncomfortable Truth: AI Now Writes Better Production Code

This Isn’t About Intelligence

The Long Time Horizon Problem

The “Genius, But Childlike” Phase

Why Process Beat Talent

This Is Why Tools Matter

The Team Size Collapse

Humans Still Matter — Just Differently

There’s No Going Back

Get in touch

Email