How to Run an Agentic Digital Agency
A practical, opinionated guide to how we actually run projects using agentic coding.
This is not theory. This is the distilled version of the workflow we evolved throughout 2025 after building real production systems with agents at the core.
Use this as an operating manual, not a manifesto.
1. First Principle: Start With Context, Not Code
Agents are only as good as the context you give them.
Never start with a task. Always start with a project.
A project is a long‑lived container for understanding, decisions, and memory.
2. Project Starting Point (Per Client)
Every client starts with a Claude Project.
This is not optional.
We do not work in ad-hoc chats. We do not start in an editor. We start by creating a project in the Claude / Anthropic ecosystem.
A Claude Project is a long-lived memory container.
It persists:
- Context
- Decisions
- Research
- Language
This project exists for the entire lifetime of the client relationship.
Initial Contents
The first iteration is intentionally high-level:
- What the client does
- What they specialise in
- Their market
- Their competitors
- Known constraints
Accuracy is less important than alignment.
Client Feedback Loop
We share early summaries back with the client and explicitly ask:
“Is this how you see yourselves?”
Corrections at this stage prevent months of misalignment later.
3. Context Building (Living Knowledge Base)
The Clients specific Claude project evolves continuously.
Over time it accumulates:
- Priorities
- Working practices
- Technical preferences
- Things to avoid
- Past decisions and why they were made
This becomes the entry point for every project.
When a new project starts, we do not re-explain the client.
We start from the client specific claude project.
4. Research Phase (Before Any Specs)
We do deep research before defining solutions.
This phase is about widening the lens, not converging too early.
Tools
- Claude Projects for long-form reasoning, synthesis, and maintaining continuity
- ChatGPT Deep Research for broad, parallel exploration and cross-checking assumptions
We intentionally use both.
Claude excels at:
- Holding long-lived context
- Building coherent narratives
- Carrying decisions forward
ChatGPT Deep Research is particularly strong at:
- Rapidly surveying large problem spaces
- Exploring adjacent domains
- Surfacing alternatives and edge cases
Outputs
- Market research
- Competitor analysis
- Product positioning
- Technical feasibility notes
Research output is never throwaway.
It is summarised, curated, and fed back into the Claude Project so it becomes part of the permanent project context.
Only once this shared understanding exists do we move on to specs.
5. From Vision to Structure
Once the context is solid:
- Write a high‑level overview
- Break it into epics and features
- Capture behaviour, not implementation
Format
- BDD‑style specs
- Gherkin where possible
This forces clarity and creates a shared language for humans and agents.
6. Git Is the Source of Truth
All execution happens in Git repositories.
Claude Projects hold thinking.
Git holds reality.
A key realisation for us in 2025 was that a repo is not just a place for code.
We use Git repositories and Claude Code as a universal work container for:
- Planning
- Specs
- Documentation
- Client deliverables
- Internal process docs
- Even presentations (as deployable Flutter sites/apps)
If it’s important enough to collaborate on, iterate on, or remember later, it belongs in a repo.
Why Claude Code (Even for Non-Code)
Claude Code’s superpower isn’t just code generation — it’s that it can:
- Explore a filesystem
- Read and write structured documents
- Keep work organised inside a repo
- Produce reviewable diffs
This makes it an extremely natural interface for planning and documentation.
Repository Structure
- /docs – background, context, decisions
- /planning – specs, epics, iterations
Plans evolve via pull requests just like code.
Core CLIs
We operate almost entirely via CLIs:
- git – source of truth
- gh (GitHub CLI) – issues, PRs, reviews
Agents are instructed to use these tools directly.
If it can’t be done via CLI, it doesn’t scale.
7. Starting Execution
Once specs exist:
- Create a Git repository
- Add planning and documentation first
- Then begin implementation
We often run agents via:
- ClaudeCode in terminal
- Persistent remote machines
- tmux for long‑running sessions
Agents work asynchronously.
Humans review via PRs.
8. Sub‑Agents
Rather than one monolithic agent, we use sub‑agents.
Typical roles:
- Planner
- Implementer
- Reviewer
- Refactorer
- Researcher
These can run within a single session.
This reduces coordination overhead and keeps context tight.
9. Agent Skills
As patterns repeat, we formalise them into skills.
Skills are reusable behaviours, not prompts.
Examples
- Code review skill (via CodeRabbit CLI)
- Refactoring skill
- Architecture enforcement
- Preferred library usage
- Dependency injection patterns
Skills encode:
- What is acceptable
- What must be fixed
- How decisions are made
This dramatically improves consistency.
10. Guardrails via Internal Packages
Agents are non‑deterministic.
Left unconstrained, they will:
- Re‑implement the same thing differently
- Increase long‑term complexity
Our Solution
Build and maintain small internal packages.
Examples:
- Dependency injection
- Data repositories
- Styling / UI systems
- Project bootstrapping
Agents are explicitly instructed to use these packages.
Most projects become composition, not reinvention.
11. Unified Language — Pure Dart (No Bash)
We made a deliberate decision to standardise on one language: Dart.
Everything is Dart.
- Backend services
- Frontend apps (Flutter)
- Tooling and CLIs
- Deployment helpers
- Automation scripts
No Bash.
No JSX.
No CSS.
No HTML.
No mixed template languages.
No competing mental models.
Dart’s Tooling Is a Force Multiplier
Dart isn’t just a pleasant language — it has exceptionally strong tooling, and that matters enormously for agentic work.
Out of the box, Dart gives us:
- A world-class analyser that surfaces errors early and precisely
- A deterministic formatter that removes style debates entirely
- The ability to run Dart files directly like scripts during development
- The ability to AOT compile the same code into fast, static binaries for any target platform with tiny amount of dependencies
This creates a rare combination:
- Script-like ergonomics during development
- Production-grade performance and predictability in deployment
Agents thrive in this environment because:
- Errors are explicit, not implicit
- Feedback loops are tight
- Formatting and structure are enforced automatically
Dart for Tooling Changes Everything
Using Dart for scripting and tooling is not just a preference — it’s a structural advantage.
Because tooling is written in the same language as the application:
- Enums can be shared
- Constants can be shared
- Configuration models can be shared
- Validation logic can be shared
There is a single source of truth.
Deployment tools, CLIs, background workers, and applications all agree on:
- Environment names
- Feature flags
- Service identifiers
- API versions
This removes an entire class of bugs caused by:
- Duplicated config
- Stringly-typed environments
- Drift between scripts and runtime code
Why This Matters for Agentic Work
Agents perform best when:
- Logic is explicit
- Types are enforced
- Errors are surfaced early
- Behaviour is predictable
Dart gives us:
- A single syntax
- A single type system
- A single formatter
- A single analyzer
- Deterministic execution
Benefits
- Dramatically reduced context switching
- Fewer environment-specific failures
- Better agent reliability
- Easier human review
- Shared code between apps, servers, and tooling
Agents produce better work when the solution space is constrained.
Humans do too.
12. Integrations — CLIs, Not MCP
We deliberately do not use MCP.
This is an explicit decision.
Why No MCP
MCP-style integrations attempt to solve agent tooling by pushing more data and affordances into the model context.
In practice, this:
- Bloats context
- Obscures intent
- Makes behaviour harder to reason about
- Increases non-determinism
Claude has a clear loop that will search dynamically for the context it needs and digs deeper when stuck, using things like the cli documentation.
What We Use Instead
We rely on:
- Explicit CLI commands
- Deterministic inputs and outputs
- Agentic execution loops
Core tools:
- git
- gh (GitHub CLI)
- linear (Linear CLI)
- coderabbit (CodeRabbit CLI)
These tools:
- Are scriptable
- Produce predictable output
- Fail loudly
- Are easy for agents to reason about
The Agentic Loop
A typical loop looks like:
- Read state (issues, code, docs)
- Make a plan
- Execute via CLI
- Inspect results
- Iterate
Nothing is hidden.
Nothing is implicit.
Humans can inspect every step.
Human-Compatible by Default
CLIs are not just agent-friendly — they are human-compatible.
That matters.
When agents use the same interfaces humans use:
- Humans can reproduce behaviour
- Humans can debug failures
- Humans can step in without translation layers
There is no hidden protocol.
There is no invisible abstraction.
What the agent does is exactly what a human would do.
MCP-style integrations, by contrast, are not human-friendly.
They:
- Hide execution details
- Make it difficult to reproduce behaviour manually
- Create a gap between human understanding and agent action
This gap is dangerous.
Agentic systems must be:
- Inspectable
- Reproducible
- Debbugable by humans
CLIs give us that for free.
13. Bug Fixing Workflow
Bug fixing is driven by issues, not conversations.
Flow
- Issue is created (GitHub preferred, Linear synced)
- Issue is labelled (P1, P2, etc.)
- Agent is instructed to fix the issue
- Agent uses gh to:
- Create branch
- Implement fix
- Open PR
- CodeRabbit runs automated review
- Review feedback is fed back to the agent
- Agent fixes issues and updates PR
- Human reviews and merges
We deliberately avoid auto-merging.
Understanding the system matters.
14. Code Reviews
Every PR is reviewed.
Automated
- Local review tools
- AI‑assisted review comments
Human
- Architectural sanity
- Long‑term maintainability
Agents fix review feedback themselves when possible.
15. Foreground vs Background Work
We explicitly separate work into foreground and background modes. When running foreground agents there is often a waiting period we fill with background tasks.
Foreground work is:
- High priority
- Actively supervised
- Tight feedback loops
Background work is:
- Lower priority
- Long-running
- Reviewed at PR time
This allows parallel progress without overwhelming any single human.
16. Reducing Human Context Switching
One of the biggest productivity wins we saw in 2025 had nothing to do with raw speed.
It came from reducing human context switching.
We achieved this by making every project look fundamentally the same.
What “The Same” Means
Across all client projects:
- Same language (Dart)
- Same repo structure
- Same internal packages
- Same tooling
- Same workflows
- Same agent skills
Once you’ve seen one project, you’ve effectively seen them all.
Why This Matters
Context switching is the silent productivity killer.
When every project has different:
- Languages
- Frameworks
- Build systems
- Conventions
…humans burn enormous cognitive energy just re-orienting.
By enforcing uniformity:
- Humans spend less time remembering how things work
- Agents spend less time rediscovering patterns
- Reviews get faster
- Quality becomes easier to judge
The Compound Effect
This uniformity enables something that would otherwise be impossible:
- One primary focus task
- One or more secondary tasks progressing in parallel
- Sometimes across different projects on the same day
Because everything looks the same, switching costs collapse.
This is what makes an agentic agency sustainable.
17. The Meta‑Rule
If something keeps repeating:
- Encode it
- Package it
- Automate it
But only after you understand it.
18. Self & Company-Level Projects
Clients are not the only thing worth giving context to.
We maintain Claude Projects for ourselves.
Company Project
This project contains:
- Company structure
- Rates and pricing
- Working practices
- Contract templates
- Past decisions
- Legal language
This allows the agent to:
- Act as a rubber duck
- Help think through decisions
- Draft SOWs and contracts
- Prepare internal documents
- Maintain consistency across the business
This is not a chatbot.
It is a contextual operating system for the company.
19. What Humans Actually Do
Humans are not replaced.
Their role shifts.
Humans provide:
- Direction
- Judgment
- Taste
- Accountability
- Client communication
They also perform a critical operational role:
Humans Unblock Agents
Agents are extremely capable, but they can still get stuck.
Common reasons include:
- Ambiguous requirements
- Conflicting constraints
- Missing context
- Decisions that require taste or business judgment
When this happens, humans step in to:
- Clarify intent
- Make a decision
- Adjust constraints
- Add missing context
Once unblocked, agents continue execution autonomously.
This creates a healthy loop:
- Agents do the work
- Humans remove blockers
- Progress resumes
Humans are not micromanaging.
They are enabling flow.
The Core Rules
If you only remember one section, make it this.
This is the compressed version of how an agentic digital agency actually works.
- Start with Claude Projects, not tasks
Long‑lived context beats clever prompts. - One project per client, for the lifetime of the relationship
Memory compounds. - Research first, always
Use Claude for depth, ChatGPT Deep Research for breadth. Collapse everything back into the project. - Git + Claude Code are inseparable
We never use Git without Claude Code, and never use Claude Code without Git.
Git provides versioned reality. Claude Code provides structured reasoning over that reality.
Planning, docs, specs, deliverables — if it matters, it’s versioned.
Claude Code is used for far more than code: plans, documents, presentations, internal ops, client deliverables — all expressed as files, reviewed as diffs.
Code, plans, documents, presentations — all via repos and diffs. - Pure Dart, everywhere
No JSX. No CSS. No Bash. One syntax, one mental model. - Tooling in Dart is a superpower
Shared enums, constants, config, validation across apps and tools. - CLIs over integrations
git, gh, linear, coderabbit — explicit beats magical. - No MCP
Context bloat hurts determinism. Clear agentic loops win. - Human‑compatible by default
If a human can’t reproduce it, the system is broken. - Guardrails > intelligence
Humans ultimately provide the intelligence: judgment, taste, intent.
Agents provide relentless execution: consistency, persistence, follow-through.
Internal packages, skills, and constraints create quality. - Agents execute, humans unblock
Humans resolve ambiguity and judgment calls, then get out of the way. - Make every project look the same
Uniformity collapses context switching and enables parallel work.
That’s the system.