New: A Turing Award winner independently verified our exact workflow. Read the post →
Three weeks ago I wrote about our AI-native development workflow. Four commands, custom Claude Code slash commands, subagents, the whole thing. It was already fast.
Then I walked a friend through the workflow on a call this week and realized, mid-screen-share, that half of what I wrote is already outdated. Not wrong, but simplified. The four commands are now three. The ten-plus commands we had six months ago are now three. The workflow is collapsing faster than I can document it.
If you haven't read the original post, the core idea is: every feature follows a repeatable pipeline of discussion, planning, and implementation, enforced by custom slash commands in Claude Code. Agents do the building. We do the thinking. You can read the full breakdown here: Two Founders, 300k Lines, Zero Engineers.
This post covers what changed and, more importantly, why it keeps changing.
the consolidation
Six months ago, our pipeline looked like this:
- → /research-codebase - explore the codebase for relevant context
- → /research-internet - look up external docs, libraries, patterns
- → (manually consolidate both into a brief)
- → /research-questions - the agent probes you with clarifying questions
- → (answer questions in a loop)
- → /create-prp - generate a detailed implementation plan
- → /review-plan - a fresh agent reviews the plan and asks more questions
- → (answer review questions in a loop)
- → /implement-simple - for small changes
- → /implement - for complex changes
- → /review-all - review the implementation
- → (answer review questions in a loop)
- → /codex-review - run a separate Codex subagent for adversarial review
- → /prepare-pr - create the PR with grouped commits
That's 10+ commands with multiple human-in-the-loop steps between them.
Today it's:
- → /discussion
- → /create-plan
- → /implement
Three commands. Everything else got absorbed, automated, or deleted.
why it collapsed
Models got better. A few weeks ago we were on Claude 4.6 Opus and realized half the commands were redundant. Research, planning, review, PR creation — the agent could handle all of it inline instead of needing separate passes. Every time a model improves, commands merge.
the three commands today
For anyone who hasn't read the original post, or for anyone who has and wants the updated version:
/discussion
This is where you spend the most time, and intentionally so. Before any code gets written, you have an actual conversation with the agent about what you're building. Not a prompt. A back-and-forth discussion.
The agent explores the codebase, asks you clarifying questions, and you go back and forth until you both have a shared understanding of the problem and the approach. I call this phase "probing." The agent keeps asking until it genuinely understands, and you keep nudging it away from assumptions and toward the actual root causes.
Here's the thing I've gotten better at: I don't type anymore. I do almost all of my discussion input through voice notes. I just talk, the same way I'd talk to a cofounder if we were whiteboarding something. Unstructured, conversational, thinking out loud. The agents handle it fine. It's way faster than typing and it keeps me in a flow state instead of fighting with a keyboard.
When the discussion reaches a natural conclusion, the agent writes a summary to a shared context file so other agents in other worktrees can reference the decisions.
The key insight from the original post still holds: if you don't understand the problem clearly, neither will the agent. Except now I'd add: you also need to trust that the model will find most of what it needs, and your job is to nudge it past the parts it gets stuck on. Especially the difference between symptoms and root causes. If the agent gives you a weirdly specific answer that doesn't feel right, it's probably fixing a symptom. Push it to dig deeper.
/create-plan
Once the discussion is done, /create-plan generates a structured implementation plan with pseudocode, file references, error handling, and a task list in execution order. It also does external research and codebase analysis to ground the plan.
The big change from the original: the plan review loop is now built into this command. After generating the plan, it automatically spawns a reviewer subagent that checks for gaps, simplification opportunities, and codebase consistency. The reviewer's recommendations come back to you as questions. You answer, the plan updates, a fresh reviewer evaluates, and this loops until you're satisfied.
All the rules from the original post still apply: no open questions in the plan, scope it to one agent session, write instructions not aspirations, replace things completely instead of shimming.
/implement
The agent implements the plan by breaking it into parallelizable chunks and spawning subagents for each one. Dependencies are respected: schema before API, backend before frontend, types before implementations.
After all chunks complete, /implement now automatically runs the reviewer, handles build verification, and gets the PR ready. This is the part that used to be a separate /prepare-pr command. The only thing I still do manually is kick off the Codex adversarial review loop.
On the Codex thing: I run it in essentially an infinite loop. Let it go for as long as it needs to catch every edge case. Claude and Codex are surprisingly adversarial with each other. Completely different training data, completely different opinions about code quality. Claude Opus has gotten better at reviewing, but it's still too generous with its own code. Codex will straight up tell you "this is bad code" in a way Claude won't. Use both.
voice-first development
This is the biggest practical change that isn't reflected in the commands at all. I do everything through voice now. Not just discussion input, but the entire workflow of capturing problems and turning them into code.
It looks like this:
I notice a bug or think of a feature. I open Doozy and record a quick voice note describing it. Doozy creates a GitHub issue and tags it (like "bug-basher" for simple fixes). The bug-basher agent, running Claude Code in the cloud with my skills and slash commands, picks it up and takes a first pass. By the time I check in, there's a PR ready for me to pull and continue working on.
For bigger features, I just dump all my messy context into /discussion via voice. The way I described it to my friend: "that's literally how I talk to it. Very human." And it is. I'm not writing prompts. I'm thinking out loud.
The voice input matters more than it sounds. When you type, you naturally try to structure your thoughts. When you talk, you ramble, you go on tangents, you think of edge cases mid-sentence. The agents handle the rambling fine, and the tangents often contain exactly the context they need that you would have forgotten to type.
harness engineering and software factories
There's new vocabulary emerging for this way of working. OpenAI published a write-up on "software factories" and there's a concept people in the founder circles I'm in call "harness engineering."
Software factory: the automated pipeline that takes a natural language description of a problem and produces a PR. The voice note to PR pipeline I described above is a software factory.
Harness engineering: the time you spend setting up and maintaining the scaffolding that makes the factory work. Your slash commands, your .claude file with skills, your monorepo structure, your worktree setup, your CI, your secrets management. You're not writing product code. You're building the harness that agents use to write product code.
I spend more than 50% of my time on harness engineering and review. Planning and reviewing. The rest is automated implementation. I haven't written my own lines of code in about six months. Even for small things. Once Opus came out, manually writing code became genuinely slower than talking to the agent.
That's a weird thing to say and it still feels weird to say it, but it's true.
token maximizing
My friend said something on our call that stuck with me. He described his own paradigm shift as going from "token minimizing" to "token maximizing." He was on a $20/month Claude subscription and realized that being content with that plan meant he was under-optimizing his own time.
The math: I'm on the $200/month Claude Max plan. I did the back-of-napkin calculation and I'm using roughly $3,000 to $5,000 worth of API credits in Opus calls every month. Anthropic is subsidizing this heavily right now. The $200 plan gives you effectively infinite context, which means I can kick off a new agent for every small thought, every tangent, every "hmm I wonder if there's a deeper issue here" instinct without worrying about cost.
This matters because the cost of NOT using the tokens is paid in your own time. Every time you think "this is too small for the agent, I'll just do it myself" you're token minimizing. Every time you think "let me just check with the agent, it'll take 30 seconds" you're token maximizing.
If you're on the $100 plan and haven't hit limits, you're probably fine. But if you're on the $20 plan and doing serious development work, you're leaving a lot on the table. The models right now are subsidized beyond reason. Take advantage of it while it lasts.
managing agents like humans
The original post talked about running 3-6 agents in parallel across git worktrees. That hasn't changed. What's changed is how I think about the job.
It's managing a team. But instead of humans, it's agents. You do the same things: you unblock them when they're stuck, you push back when their approach doesn't feel right, you give them more context when they're making assumptions, you prioritize which one needs your attention next.
The discussion phase is like hopping on a meeting with a cofounder and designing a feature together. You're collaborating. The planning phase is like writing a spec for an engineer. The implementation phase is like letting the engineer go heads-down while you context-switch to the next feature.
Your job, as the human in this loop, is to float between the agents. Switch to one, unblock it, switch to the next. In Pane, I use Ctrl+Up/Down to cycle between sessions. When one agent finishes and needs review, I check it, flag issues, and send it back. When another is waiting on a decision, I voice-note my answer and move on.
On a good day, this feels like flow state. On a bad day, you realize an agent went sideways three hours ago and you have to untangle it. But the ratio of good days to bad days keeps improving as the models get better and the skills get tighter.
what's changed in the infrastructure
A few Pane-specific updates since the last post:
- → Run scripts auto-generate on first use. Click the run button for a new worktree and Pane handles the dev server script with isolated ports. No setup.
- → Secrets management copies
.envfiles across worktrees automatically. This used to be a massive pain point since anything uncommitted (including your secrets) wouldn't carry to the new worktree. - → Ctrl+Up/Down for cycling between agent sessions.
- → Coming soon: status notifications so you know when an agent is waiting for input. Right now I sometimes miss that it's my turn to respond because I'm heads-down in another session.
Also worth calling out a point the original post made in passing: monorepos are non-negotiable. Having all your logic in one place means every agent can reason about the full system. Every founder I know who ships with AI agents has a monorepo now. The companies that didn't consolidate are struggling because their agents can't see across service boundaries.
what's next
The commands will keep consolidating. I'd bet good money that in six months it's either two commands or one. The discussion and planning phases might merge once models can hold the full context of "understand the problem AND design the solution" in a single pass. The implementation and review might merge once adversarial cross-model review can happen inline.
Codex integration directly into Pane is on the roadmap so the adversarial loop doesn't have to be manual anymore.
And the meta thing that keeps getting more true: the workflow for improving the workflow is the workflow itself. I used Doozy to capture the voice note that became the GitHub issue that the agent fixed to improve how Pane manages worktrees. It's turtles all the way down.
All the Claude Code commands and slash commands are open source: github.com/Dcouple-Inc/Pane/.claude. If you haven't read the original post, start there for the detailed breakdown of each phase: Two Founders, 300k Lines, Zero Engineers.