We're two founders building Doozy, a todo list that completes itself. Our codebase is a 300k line Next.js monorepo. Neither of us came from traditional software engineering backgrounds. At any given moment, we have 3-6 AI coding agents running in parallel across git worktrees.
People keep asking how we actually work. Not the high-level "we use AI" handwave, but the specific, repeatable process. So here it is.
the pipeline
Every feature follows the same four commands: /discussion, /plan, /implement, /prepare-pr. We enforce this with custom Claude Code slash commands and specialized subagents. You can find all of them in our open-source repo: github.com/Dcouple-Inc/Pane.
The core insight after six months of doing this: the quality of your implementation is entirely downstream of the quality of your planning. Skip the discussion and planning phases and you end up in the death loop. More on that later.
/discussion
Before writing a single line of code, we have a conversation with the agent about what we're building. This isn't a prompt. It's an actual interactive, back-and-forth discussion.
The /discussion command is conversation only. It will never edit, create, or delete any source code. It spawns codebase-explorer and researcher subagents as needed to investigate the codebase, but its only output is talking to you and writing decisions to a shared context file.
Say we want to add audio capture to an input component. The discussion looks like:
- → "Are there any audio capture components in the codebase already?"
- → "Find all files related to audio capture and return a tree view. Each file should have a short summary explaining what it does."
- → "Walk me through how the current audio capture system works from the point the user presses record to the point the summary is generated. Include the files."
This is the rabbit hole loop. Each answer generates new questions. You keep going until you have clarity on what you do and don't understand. When the discussion reaches a natural conclusion, it writes a summary to .context/context.md with the key decisions, so other agents in other worktrees can reference the outcomes.
The key principle: you need clarity to guide the agents effectively. If you don't understand the codebase, neither will they.
/plan
Once we have clarity from the discussion, we run /plan. This does three things:
- → Codebase analysis — searches for similar features and patterns, identifies files to reference, notes existing conventions
- → External research — library documentation, implementation examples, best practices (with specific URLs the agent can follow)
- → Plan generation — using a structured template that includes pseudocode, file references, error handling strategy, and a task list in execution order
The plan gets saved to ./tmp/ready-plans/. Then the command automatically enters an iterative review loop: it spawns a fresh plan-reviewer subagent that checks for gaps, simplification opportunities, correctness, and codebase consistency. The reviewer's recommendations get presented to us as questions. We decide what to incorporate, skip, or modify. Then a fresh reviewer evaluates the updated plan. This loops until we're satisfied.
A few rules that made our plans dramatically better:
- → No backwards compatibility. If something is being replaced, replace it completely. No shims, no fallbacks, no compatibility layers.
- → No aspirations, only instructions. "Implement OAuth" is aspirational. "Add a
useAuthhook insrc/hooks/that wraps the existingauthServiceand exposeslogin,logout, andisAuthenticated" is an instruction. - → Scoped correctly. If a plan is too big for one agent session, it's too big. Break it up.
- → No open questions. If you encounter something unresolved during planning, stop. Research or ask. Never write a plan with unresolved questions.
We score every plan on a 1-10 confidence scale for one-pass implementation success. If it's below an 8, we keep iterating.
/implement
The agent implements the plan. Not "build this feature." It implements a specific, already-reviewed plan document from ./tmp/ready-plans/.
/implement breaks the plan into parallelizable chunks and spawns implementer subagents for each chunk. The chunking respects dependencies: database/schema changes complete before API endpoints, backend APIs exist before frontend integration, types are defined before implementations that use them.
Each chunk runs through a quality loop: implement, then npm run typecheck, npm run lint, npm run format. Fix all issues before proceeding.
After all chunks complete, /implement automatically spawns an implementation-reviewer subagent that runs typecheck and lint, checks every task in the plan was completed, and flags any gaps. Once everything passes, the plan gets moved from ready-plans/ to done-plans/.
/prepare-pr
This is where the code becomes a PR and the cross-model review happens. /prepare-pr does five things in sequence:
- → Commits changes grouped by done-plans. It reads each completed plan, matches changed files to the plan they belong to, and creates one logical commit per plan. No
git add ., ever. - → Rebases main onto the current branch, resolving trivial conflicts automatically and asking about ambiguous ones.
- → Builds both apps (
npx nx build @doozy/webappandnpx nx build @doozy/api). If either fails, it fixes the errors and re-runs until both pass. - → Runs the Codex review loop. Codex runs as a subagent inside Claude Code with
codex review --branch main. It reviews the diff against main, finds issues, Claude Code fixes them, Codex reviews again. This loops until there are no bugs. Two different models reviewing each other catches things self-review misses. - → Creates or updates the PR with a description built from the done-plans, then pushes with
--force-with-lease.
human review
After the PR is up, we do human review. Every PR gets evaluated against the same criteria:
The single responsibility principle: there should only be one way to do something.
- → Are we re-using things instead of re-creating them?
- → Is it easy to understand what's going on?
- → Is the PR scoped correctly? Is it focused on a single objective, or is there a bunch of different stuff happening that should have been split up?
- → Are there any anti-patterns? Async imports halfway down in the code, not using repositories or conventions already established in the codebase?
We also run refactor analysis commands (/refactor-check, /deep-refactor) that score the code against our actual codebase patterns: zero relative imports, authenticatedHandler in controllers, BaseService for DB services, TanStack Query for server state, thin pages with orchestration hooks, underscore-prefix locality for local code. The target is 9.8/10.
If the code needs work, it goes back to /discussion. Not back to /implement. The fix for bad code is almost never "try implementing again." It's "we didn't understand something well enough."
the death loop
The death loop is when you have the LLM attempt to implement a plan and it's not even working, and you spend the rest of the day trying to get it to work.
What causes it:
- → Plan that's too big. The agent runs out of context, starts taking shortcuts. You'd rather it just not finish things, but it still tries to finish and takes shortcuts.
- → Not enough codebase research. The agent makes assumptions about how your code works that are wrong, then builds on those assumptions.
- → Vague instructions. "Add authentication" has a hundred possible implementations. The agent picks one. It's probably not the one you wanted.
The fix is always the same: go back to /discussion. Understand more. Plan more precisely. Scope more tightly.
the tooling
We manage all of this through Pane (open source on GitHub). Each feature gets its own worktree and its own Pane session. I can see at a glance which agents are working, which ones are blocked waiting for input, and which ones finished.
Each worktree gets a run button that auto-generates a dev server script on isolated ports, so I can have every branch hot reloading in its own browser tab simultaneously. Keyboard-driven, agent-agnostic, cross-platform.
All of the Claude Code commands and agents referenced in this post are in the repo: github.com/Dcouple-Inc/Pane/.claude/commands
On a good day, I merge 6-8 PRs that I barely touched by hand. On a bad day, I spend hours fixing an agent that went sideways on a task I should have just done myself. The ratio is getting better.
what actually matters
The skill that matters isn't coding. It's:
- → Decomposing problems into agent-sized tasks
- → Writing clear enough specs that agents don't hallucinate
- → Knowing when to research more vs. when to just implement
- → Knowing when to just do it yourself
- → Keeping the codebase clean so agents can reason about it on the next feature
That last one compounds. If you let entropy creep in, agent performance degrades because the codebase becomes harder to reason about. Every sloppy merge makes the next feature harder for both you and the agents.
I'm not saying engineers are obsolete. I'm saying the bar for what you need to hire for has shifted. A notable Seattle startup recently laid off 35% of its engineers because the team was resistant to adopting AI. An ex-IPO'd founder I know just hired two of my friends in their early 20s who are extremely AI-native. They use Pane every day for their dev.
Aviel wrote something recently that captures this moment: "Trust is the high order bit now." LLMs have lowered the bar for technical execution so far that it can effectively be removed from the list of constraints. When small teams of average technical ability can build good products really fast, the differentiator isn't code. It's customer insight, access, and trust.
He also wrote a harder-edged follow-up about what this looks like on the ground in Seattle specifically. There are a lot less tech jobs than even a few years ago. When 80% of LLM skeptics on LinkedIn have "Open to Work" with "Software Architect" on their bio, it's more than a passing trend. His framing: if you work in tech in 2026, you're either at the beginning of your career or at the end of it.
That's the world we're building in. Two non-traditional founders with AI agents shipping a 300k line product. The technical pedigree filter is broken. What matters now is whether you can solve real problems, build trust, and ship relentlessly. The agents handle the rest.
Parsa is co-founder of Doozy and Pane. Pane is open source: github.com/Dcouple-Inc/Pane. All Claude Code commands referenced in this post: .claude/commands.