For most of 2024, we believed the same thing everyone else did:
AI can’t write quality code.
It was useful, sure. Autocomplete on steroids. A faster way to scaffold things, write tests, explore ideas. But production work? Real systems? Long‑lived codebases with real users and real consequences?
That still felt firmly human territory.
By the end of 2025, that belief was gone.
This post isn’t about models, benchmarks, or hype. It’s about how we quietly, and sometimes reluctantly, rebuilt the way we work so that agentic coding sits at the centre of our business. Not as a novelty, but as infrastructure.
The earliest mistake we made with AI was treating it like a better editor.
When results were disappointing, it was tempting to blame the models. In hindsight, the problem was us. We were dropping agents into tasks with no shared understanding, no continuity, and no memory of past decisions.
The breakthrough came when we stopped thinking in terms of tasks and started thinking in terms of projects with persistent context.
Today, every client we work with has a dedicated project that lives for the entire lifetime of the relationship. That project starts simple — a clear description of what the client does and what they specialise in — but it quickly grows into something much richer.
We treat it as a living knowledge base. Market research, competitor analysis, priorities, constraints, working practices, even the language the client prefers to use. Importantly, we share this back with clients early and often, so what the agent understands is aligned with what the client believes about themselves.
Once that foundation exists, every piece of work starts from there. The agent is never operating in a vacuum.
Another early lesson was that good output depends on depth, not speed.
Before we spec anything, we do deep research. We use Claude heavily, but we’ll often run parallel passes with other models as well, just to widen the lens. The goal isn’t to rush to solutions, but to build a strong, shared understanding of the problem space.
That research doesn’t get thrown away. It becomes part of the project’s permanent context.
Only once that picture is clear do we move into defining what we’re actually going to build.
When it’s time to define a piece of work, we deliberately slow things down.
We start with a high‑level overview of what the client wants to achieve, then gradually break that down into behaviour‑driven features and epics. We tend to capture this in Gherkin, not because it’s fashionable, but because it forces clarity. It describes intent in a way that both humans and agents can reason about.
At this point, code still hasn’t been written.
Instead, we create a repository and put the plan inside it. Every project has a documents area and a planning area, and those plans evolve through pull requests just like code does. Decisions are explicit. Changes are reviewable. History is preserved.
This might sound heavy, but in practice it’s the opposite. Once the plan is clear, execution becomes dramatically easier.
Execution is where agentic coding really started to feel different.
In the early days, we ran multiple agents in parallel and coordinated them through Git. That worked, but it was clumsy. As tooling improved, we moved towards using sub‑agents within a single session, each focused on a specific responsibility.
Over time, patterns emerged. Certain tasks kept coming up again and again: reviewing code, enforcing conventions, applying architectural decisions. Instead of repeating ourselves, we formalised these into reusable skills.
Just as importantly, we learned that agents will happily solve the same problem in five different ways if you let them.
Our answer wasn’t stricter prompting. It was constraints.
We invested heavily in small, focused internal packages — things like data access, dependency injection, styling, and project structure. Agents are explicitly instructed to use these. The effect was immediate. Less duplication, fewer surprises, and a codebase that feels cohesive rather than improvised.
Most projects now consist of gluing well‑understood modules together instead of reinventing fundamentals.
One principle runs through all of this: if it matters, it lives in Git.
Specs, plans, documentation, code — everything is versioned. Agents open pull requests. Humans review them. The workflow is familiar, which makes it safe.
We often run agents on persistent remote machines, kicking off work before the end of the day and reviewing results in the morning. It’s asynchronous by default, but never opaque.
That balance turned out to be crucial.
Looking back, the reason this transition succeeded wasn’t because AI suddenly became brilliant overnight.
It worked because we redesigned our workflow around a few core ideas:
Context matters more than prompts. Guardrails matter more than raw capability. And humans stay in the loop — not to type faster, but to guide, review, and decide.
By the time we reached the end of 2025, we realised something had fundamentally changed. Not just how we build software, but what a small team is capable of.
That realisation — and its implications — deserve a post of their own.
In Part 2, we’ll talk about the moment it became clear that AI wasn’t just keeping up with production work, but quietly surpassing what we were doing before.