How to Run an Agentic Digital Agency

A practical, opinionated guide to how we actually run projects using agentic coding.

This is not theory. This is the distilled version of the workflow we evolved throughout 2025 after building real production systems with agents at the core.

Use this as an operating manual, not a manifesto.

1. First Principle: Start With Context, Not Code

Agents are only as good as the context you give them.

Never start with a task. Always start with a project.

A project is a long‑lived container for understanding, decisions, and memory.

2. Project Starting Point (Per Client)

Every client starts with a Claude Project.

This is not optional.

We do not work in ad-hoc chats. We do not start in an editor. We start by creating a project in the Claude / Anthropic ecosystem.

A Claude Project is a long-lived memory container.

It persists:

Context
Decisions
Research
Language

This project exists for the entire lifetime of the client relationship.

Initial Contents

The first iteration is intentionally high-level:

What the client does
What they specialise in
Their market
Their competitors
Known constraints

Accuracy is less important than alignment.

Client Feedback Loop

We share early summaries back with the client and explicitly ask:

“Is this how you see yourselves?”

Corrections at this stage prevent months of misalignment later.

3. Context Building (Living Knowledge Base)

The Clients specific Claude project evolves continuously.

Over time it accumulates:

Priorities
Working practices
Technical preferences
Things to avoid
Past decisions and why they were made

This becomes the entry point for every project.

When a new project starts, we do not re-explain the client.

We start from the client specific claude project.

4. Research Phase (Before Any Specs)

We do deep research before defining solutions.

This phase is about widening the lens, not converging too early.

Tools

Claude Projects for long-form reasoning, synthesis, and maintaining continuity
ChatGPT Deep Research for broad, parallel exploration and cross-checking assumptions

We intentionally use both.

Claude excels at:

Holding long-lived context
Building coherent narratives
Carrying decisions forward

ChatGPT Deep Research is particularly strong at:

Rapidly surveying large problem spaces
Exploring adjacent domains
Surfacing alternatives and edge cases

Outputs

Market research
Competitor analysis
Product positioning
Technical feasibility notes

Research output is never throwaway.

It is summarised, curated, and fed back into the Claude Project so it becomes part of the permanent project context.

Only once this shared understanding exists do we move on to specs.

5. From Vision to Structure

Once the context is solid:

Write a high‑level overview
Break it into epics and features
Capture behaviour, not implementation

Format

BDD‑style specs
Gherkin where possible

This forces clarity and creates a shared language for humans and agents.

6. Git Is the Source of Truth

All execution happens in Git repositories.

Claude Projects hold thinking.

Git holds reality.

A key realisation for us in 2025 was that a repo is not just a place for code.

We use Git repositories and Claude Code as a universal work container for:

Planning
Specs
Documentation
Client deliverables
Internal process docs
Even presentations (as deployable Flutter sites/apps)

If it’s important enough to collaborate on, iterate on, or remember later, it belongs in a repo.

Why Claude Code (Even for Non-Code)

Claude Code’s superpower isn’t just code generation — it’s that it can:

Explore a filesystem
Read and write structured documents
Keep work organised inside a repo
Produce reviewable diffs

This makes it an extremely natural interface for planning and documentation.

Repository Structure

/docs – background, context, decisions
/planning – specs, epics, iterations

Plans evolve via pull requests just like code.

Core CLIs

We operate almost entirely via CLIs:

git – source of truth
gh (GitHub CLI) – issues, PRs, reviews

Agents are instructed to use these tools directly.

If it can’t be done via CLI, it doesn’t scale.

7. Starting Execution

Once specs exist:

Create a Git repository
Add planning and documentation first
Then begin implementation

We often run agents via:

ClaudeCode in terminal
Persistent remote machines
tmux for long‑running sessions

Agents work asynchronously.

Humans review via PRs.

8. Sub‑Agents

Rather than one monolithic agent, we use sub‑agents.

Typical roles:

Planner
Implementer
Reviewer
Refactorer
Researcher

These can run within a single session.

This reduces coordination overhead and keeps context tight.

9. Agent Skills

As patterns repeat, we formalise them into skills.

Skills are reusable behaviours, not prompts.

Examples

Code review skill (via CodeRabbit CLI)
Refactoring skill
Architecture enforcement
Preferred library usage
Dependency injection patterns

Skills encode:

What is acceptable
What must be fixed
How decisions are made

This dramatically improves consistency.

10. Guardrails via Internal Packages

Agents are non‑deterministic.

Left unconstrained, they will:

Re‑implement the same thing differently
Increase long‑term complexity

Our Solution

Build and maintain small internal packages.

Examples:

Dependency injection
Data repositories
Styling / UI systems
Project bootstrapping

Agents are explicitly instructed to use these packages.

Most projects become composition, not reinvention.

11. Unified Language — Pure Dart (No Bash)

We made a deliberate decision to standardise on one language: Dart.

Everything is Dart.

Backend services
Frontend apps (Flutter)
Tooling and CLIs
Deployment helpers
Automation scripts

No Bash.

No JSX.

No CSS.

No HTML.

No mixed template languages.

No competing mental models.

Dart’s Tooling Is a Force Multiplier

Dart isn’t just a pleasant language — it has exceptionally strong tooling, and that matters enormously for agentic work.

Out of the box, Dart gives us:

A world-class analyser that surfaces errors early and precisely
A deterministic formatter that removes style debates entirely
The ability to run Dart files directly like scripts during development
The ability to AOT compile the same code into fast, static binaries for any target platform with tiny amount of dependencies

This creates a rare combination:

Script-like ergonomics during development
Production-grade performance and predictability in deployment

Agents thrive in this environment because:

Errors are explicit, not implicit
Feedback loops are tight
Formatting and structure are enforced automatically

Dart for Tooling Changes Everything

Using Dart for scripting and tooling is not just a preference — it’s a structural advantage.

Because tooling is written in the same language as the application:

Enums can be shared
Constants can be shared
Configuration models can be shared
Validation logic can be shared

There is a single source of truth.

Deployment tools, CLIs, background workers, and applications all agree on:

Environment names
Feature flags
Service identifiers
API versions

This removes an entire class of bugs caused by:

Duplicated config
Stringly-typed environments
Drift between scripts and runtime code

Why This Matters for Agentic Work

Agents perform best when:

Logic is explicit
Types are enforced
Errors are surfaced early
Behaviour is predictable

Dart gives us:

A single syntax
A single type system
A single formatter
A single analyzer
Deterministic execution

Benefits

Dramatically reduced context switching
Fewer environment-specific failures
Better agent reliability
Easier human review
Shared code between apps, servers, and tooling

Agents produce better work when the solution space is constrained.

Humans do too.

12. Integrations — CLIs, Not MCP

We deliberately do not use MCP.

This is an explicit decision.

Why No MCP

MCP-style integrations attempt to solve agent tooling by pushing more data and affordances into the model context.

In practice, this:

Bloats context
Obscures intent
Makes behaviour harder to reason about
Increases non-determinism

Claude has a clear loop that will search dynamically for the context it needs and digs deeper when stuck, using things like the cli documentation.

What We Use Instead

We rely on:

Explicit CLI commands
Deterministic inputs and outputs
Agentic execution loops

Core tools:

git
gh (GitHub CLI)
linear (Linear CLI)
coderabbit (CodeRabbit CLI)

These tools:

Are scriptable
Produce predictable output
Fail loudly
Are easy for agents to reason about

The Agentic Loop

A typical loop looks like:

Read state (issues, code, docs)
Make a plan
Execute via CLI
Inspect results
Iterate

Nothing is hidden.

Nothing is implicit.

Humans can inspect every step.

Human-Compatible by Default

CLIs are not just agent-friendly — they are human-compatible.

That matters.

When agents use the same interfaces humans use:

Humans can reproduce behaviour
Humans can debug failures
Humans can step in without translation layers

There is no hidden protocol.

There is no invisible abstraction.

What the agent does is exactly what a human would do.

MCP-style integrations, by contrast, are not human-friendly.

They:

Hide execution details
Make it difficult to reproduce behaviour manually
Create a gap between human understanding and agent action

This gap is dangerous.

Agentic systems must be:

Inspectable
Reproducible
Debbugable by humans

CLIs give us that for free.

13. Bug Fixing Workflow

Bug fixing is driven by issues, not conversations.

Flow

Issue is created (GitHub preferred, Linear synced)
Issue is labelled (P1, P2, etc.)
Agent is instructed to fix the issue
Agent uses gh to:
- Create branch
- Implement fix
- Open PR
CodeRabbit runs automated review
Review feedback is fed back to the agent
Agent fixes issues and updates PR
Human reviews and merges

We deliberately avoid auto-merging.

Understanding the system matters.

14. Code Reviews

Every PR is reviewed.

Automated

Local review tools
AI‑assisted review comments

Human

Architectural sanity
Long‑term maintainability

Agents fix review feedback themselves when possible.

15. Foreground vs Background Work

We explicitly separate work into foreground and background modes. When running foreground agents there is often a waiting period we fill with background tasks.

Foreground work is:

High priority
Actively supervised
Tight feedback loops

Background work is:

Lower priority
Long-running
Reviewed at PR time

This allows parallel progress without overwhelming any single human.

16. Reducing Human Context Switching

One of the biggest productivity wins we saw in 2025 had nothing to do with raw speed.

It came from reducing human context switching.

We achieved this by making every project look fundamentally the same.

What “The Same” Means

Across all client projects:

Same language (Dart)
Same repo structure
Same internal packages
Same tooling
Same workflows
Same agent skills

Once you’ve seen one project, you’ve effectively seen them all.

Why This Matters

Context switching is the silent productivity killer.

When every project has different:

Languages
Frameworks
Build systems
Conventions

…humans burn enormous cognitive energy just re-orienting.

By enforcing uniformity:

Humans spend less time remembering how things work
Agents spend less time rediscovering patterns
Reviews get faster
Quality becomes easier to judge

The Compound Effect

This uniformity enables something that would otherwise be impossible:

One primary focus task
One or more secondary tasks progressing in parallel
Sometimes across different projects on the same day

Because everything looks the same, switching costs collapse.

This is what makes an agentic agency sustainable.

17. The Meta‑Rule

If something keeps repeating:

Encode it
Package it
Automate it

But only after you understand it.

18. Self & Company-Level Projects

Clients are not the only thing worth giving context to.

We maintain Claude Projects for ourselves.

Company Project

This project contains:

Company structure
Rates and pricing
Working practices
Contract templates
Past decisions
Legal language

This allows the agent to:

Act as a rubber duck
Help think through decisions
Draft SOWs and contracts
Prepare internal documents
Maintain consistency across the business

This is not a chatbot.

It is a contextual operating system for the company.

19. What Humans Actually Do

Humans are not replaced.

Their role shifts.

Humans provide:

Direction
Judgment
Taste
Accountability
Client communication

They also perform a critical operational role:

Humans Unblock Agents

Agents are extremely capable, but they can still get stuck.

Common reasons include:

Ambiguous requirements
Conflicting constraints
Missing context
Decisions that require taste or business judgment

When this happens, humans step in to:

Clarify intent
Make a decision
Adjust constraints
Add missing context

Once unblocked, agents continue execution autonomously.

This creates a healthy loop:

Agents do the work
Humans remove blockers
Progress resumes

Humans are not micromanaging.

They are enabling flow.

The Core Rules

If you only remember one section, make it this.

This is the compressed version of how an agentic digital agency actually works.

Start with Claude Projects, not tasks
Long‑lived context beats clever prompts.
One project per client, for the lifetime of the relationship
Memory compounds.
Research first, always
Use Claude for depth, ChatGPT Deep Research for breadth. Collapse everything back into the project.
Git + Claude Code are inseparable
We never use Git without Claude Code, and never use Claude Code without Git.
Git provides versioned reality. Claude Code provides structured reasoning over that reality.
Planning, docs, specs, deliverables — if it matters, it’s versioned.
Claude Code is used for far more than code: plans, documents, presentations, internal ops, client deliverables — all expressed as files, reviewed as diffs.
Code, plans, documents, presentations — all via repos and diffs.
Pure Dart, everywhere
No JSX. No CSS. No Bash. One syntax, one mental model.
Tooling in Dart is a superpower
Shared enums, constants, config, validation across apps and tools.
CLIs over integrations
git, gh, linear, coderabbit — explicit beats magical.
No MCP
Context bloat hurts determinism. Clear agentic loops win.
Human‑compatible by default
If a human can’t reproduce it, the system is broken.
Guardrails > intelligence
Humans ultimately provide the intelligence: judgment, taste, intent.
Agents provide relentless execution: consistency, persistence, follow-through.
Internal packages, skills, and constraints create quality.
Agents execute, humans unblock
Humans resolve ambiguity and judgment calls, then get out of the way.
Make every project look the same
Uniformity collapses context switching and enables parallel work.

That’s the system.

‍

How to Run an Agentic Digital Agency

How to Run an Agentic Digital Agency

1. First Principle: Start With Context, Not Code

2. Project Starting Point (Per Client)

Initial Contents

Client Feedback Loop

3. Context Building (Living Knowledge Base)

4. Research Phase (Before Any Specs)

Tools

Outputs

5. From Vision to Structure

Format

6. Git Is the Source of Truth

Why Claude Code (Even for Non-Code)

Repository Structure

Core CLIs

7. Starting Execution

8. Sub‑Agents

9. Agent Skills

Examples

10. Guardrails via Internal Packages

Our Solution

11. Unified Language — Pure Dart (No Bash)

Dart’s Tooling Is a Force Multiplier

Dart for Tooling Changes Everything

Why This Matters for Agentic Work

Benefits

12. Integrations — CLIs, Not MCP

Why No MCP

The Agentic Loop

Human-Compatible by Default

13. Bug Fixing Workflow

Flow

14. Code Reviews

Automated

Human

15. Foreground vs Background Work

16. Reducing Human Context Switching

What “The Same” Means

Why This Matters

The Compound Effect

17. The Meta‑Rule

18. Self & Company-Level Projects

Company Project

19. What Humans Actually Do

Humans Unblock Agents

The Core Rules

Get in touch

Email