In Part 1, I showed you what happened when AI agents ran at full speed: 12 projects, 10,000+ lines of code, five production deployments in a single day. In Part 2, I explained why that output compounds, why tomorrow will always be faster than today.
But there is something I left out of both of those stories. Something that matters more than the output numbers or the compounding math.
How do you actually control all of this?
When AI agents can execute 100 tasks per day, the bottleneck is no longer coding or deployment or testing. The bottleneck is decision-making. Approvals. Routing the right question to the right human at the right time, without creating a queue that grinds everything to a halt.
This is Part 3. The architecture. The orchestration overhaul that makes the acceleration sustainable.
The orchestration problem nobody warns you about
Here is what happens when you first unlock AI agent productivity. The agents work fast. Incredibly fast. They spin up sub-agents, complete tasks, request approvals, and move on to the next thing. Within hours, your biggest problem is not "how do I get more done" but "how do I keep up with the decisions these agents need from me."
The first version of our system routed everything through a single approval queue. Every blocker, every decision, every ambiguous question went to the same place: my inbox. By the end of that first productive day, I had become the bottleneck I was trying to eliminate.
The agents could do 100 things per day. But they could only do them if I answered 50 questions first. And half of those questions did not even need me. They needed a credentials owner, or a QA reviewer, or a technical architect. Routing everything to the CEO is the organizational equivalent of putting all your database queries through a single-threaded connection. It works until it does not, and then everything stops.
The single-queue failure mode
When you run autonomous agents without proper routing, you create a paradox. The agents are fast, but they block on human decisions. The human gets overwhelmed by the volume of decisions. The queue grows. The agents idle. You end up with expensive AI capacity sitting dormant while a single person triages a pile of mixed-priority approvals. The solution is not a faster human. It is a smarter routing system.
Multi-lane HITL routing: six lanes for six types of decisions
We rebuilt the entire approval architecture from scratch. Instead of one queue, we created six specialized lanes, each designed for a specific type of human decision. Every blocker, every approval request, every ambiguous question gets classified and routed to the lane where it will be resolved fastest.
The key insight is that most decisions do not need the CEO. An agent blocked on an API key needs the credentials owner, not the founder. An agent that finished a build and needs visual QA needs a reviewer, not the architect. By routing each decision to the person most qualified to resolve it, we eliminated the single-queue bottleneck entirely.
Protecting the CEO queue
Lane A, the CEO lane, has the strictest entry criteria of any lane in the system. We call them the Mike protection rules, because their entire purpose is to prevent the founder from becoming the bottleneck again.
Six criteria must ALL pass before a task enters the CEO queue:
1. Clear owner check. Has the task been verified that no other lane can handle it? If a credentials owner or architect can resolve the blocker, it goes to them instead.
2. Decision framing. The request must include a clear statement of what decision is needed. No vague "what should I do?" questions. The agent must articulate the specific choice required.
3. Recommendation required. The agent must present its own recommendation before asking for a decision. The CEO should be evaluating a proposal, not generating one from scratch.
4. Bounded options. Maximum three options presented. No open-ended lists. No "here are seven approaches we could take." Three or fewer, clearly articulated, with tradeoffs stated.
5. Risk statement. Every CEO-lane request must include a clear statement of what happens if the decision is wrong or delayed. This lets the founder prioritize based on actual impact.
6. Urgency classification. Is this blocking active work right now, or can it wait until the next review cycle? Urgent items surface first. Non-urgent items batch into daily reviews.
If any of these six criteria fail, the task is automatically rerouted. It goes to Lane F, the HITL Manager, for triage and possible resolution without CEO involvement. Hard fail conditions, like requesting CEO approval for a task that clearly has a credentials or architecture owner, trigger immediate rerouting with no human intervention needed.
Confidence-based routing: when agents should just go
Not every decision needs a human at all. The second architectural layer is confidence-based routing, which determines whether a task even enters the HITL system in the first place.
Every agent decision gets a confidence score. The routing rules are simple:
At 90% confidence or above, the agent proceeds without asking anyone. It has enough context, enough precedent from prior decisions, and enough clarity on the task requirements to execute safely. This is where the vast majority of routine work lands. File updates, code deployments to staging, documentation generation, data processing. The agent knows what to do and just does it.
Between 70% and 89%, the agent proceeds but with guardrails. It might deploy to a staging environment but not production. It might draft a client email but save it to drafts instead of sending. It might implement a feature but flag it for review before merging. The work moves forward, but with a safety net.
Below 70%, the task hits the HITL system. The agent has identified genuine ambiguity, risk, or a lack of precedent, and it routes the decision to the appropriate human lane for resolution.
Why the thresholds matter
Without confidence-based routing, you get one of two failure modes. Either agents are too autonomous and make costly mistakes because nobody reviewed the risky decisions. Or agents are too dependent and nothing moves without human approval on every step. The confidence threshold creates a middle path: agents handle the routine, humans handle the ambiguous, and the system learns which decisions fall into which category over time.
The Asana dashboard architecture
Routing decisions is only half the problem. You also need visibility. When 20 agents are working across a dozen projects, you need to see the state of everything at a glance. Not buried in Slack threads or email chains. In a structured dashboard where every task has a status, an owner, and a clear next action.
We built four specialized Asana dashboard projects to serve as the operational backbone:
Dash: HITL
The multi-lane approval routing dashboard with 9 sections. Every task that enters the HITL system lands here, automatically sorted into the correct lane. Sections map to the six lanes plus triage, completed, and escalated states. Custom views let each lane owner see only their queue. The HITL Manager sees everything.
Dash: PRD
Product requirements tracking. Every new feature, every client request, every internal improvement starts as a PRD entry. The dashboard tracks requirements from initial capture through scoping, approval, and handoff to development. PRDs link directly to their corresponding technical specs in the TSD dashboard.
Dash: TSD
Technical specification tracking. When a PRD is approved, its technical implementation plan lives here. Architecture decisions, data models, API designs, deployment strategies. TSDs link back to their source PRDs and forward to the HITL dashboard when they encounter blockers that need human decisions.
Dash: Proposal
The sales pipeline from first contact to closed deal. Five stages: New, Draft, Review, Send, and the final resolution of Won, Lost, or Deferred. This dashboard tracks every client opportunity with the same rigor we apply to development tasks. Proposals link to PRDs when they convert to active projects.
Across the entire workspace, we created 22 custom fields: 9 enum fields and 13 text fields. The key fields that make the system work are AI1 Stage (tracking where each task sits in the agent pipeline), Current State (with 15 possible options covering everything from "awaiting triage" to "deployed and verified"), Blocker Type (10 categories of blockers so we can analyze patterns), Routed Lane (which human queue the task is in), Confidence (the agent's self-assessed confidence score), and Risk (the potential impact of getting this decision wrong).
Pilot testing: 25 out of 25
Architecture means nothing without validation. Before we trusted this system with production workloads, we ran four pilot tests designed to exercise every lane, every routing rule, and every edge case we could imagine.
Pilot 1: Lead flow. A new sales lead enters the system. Does the Proposal dashboard capture it correctly? Does the PRD get created and linked? Does the TSD reference the right PRD? We traced the entire chain from first contact to technical specification, verifying every link and every custom field populated correctly. Passed.
Pilot 2: Client flow. A full development cycle from PRD creation through TSD drafting, hitting an architecture blocker that routes to the HITL system, resolution by the appropriate lane owner, QA review, and production release. This pilot tested the most complex path through the system, the one where a task crosses multiple dashboards and multiple human lanes before reaching deployment. Passed.
Pilot 3: Credentials blocker. An agent needs an API key to proceed. Does the system correctly identify this as a credentials issue? Does it route to Lane C instead of the CEO? Does the credentials owner get the notification with enough context to provide the key without asking follow-up questions? Passed.
Pilot 4: Browser blocker. An agent needs to interact with a website that requires local browser access. Does the system route to Lane D? After the desktop task is complete, does the work flow correctly to QA in Lane E, and then to release? This pilot tested multi-lane sequential routing, where a task needs to pass through two human lanes before completion. Passed.
Twenty-five individual validation checks across all four pilots. Every single one passed. The routing was correct, the custom fields populated accurately, the notifications reached the right people, and the dashboards reflected the true state of every task in real time.
Want to see this architecture in action?
The AI1 platform powers everything described in this series. See how AI agents with proper orchestration can transform your operations.
The task accountability monitor
Routing decisions to the right people solves half the problem. The other half is making sure those people actually respond. When you have 174+ tasks assigned across a workspace, some of them will inevitably slip through the cracks. Not because people are negligent, but because the volume is high and priorities shift hourly.
We built a task accountability monitor that runs every hour. It scans every assigned task in the workspace and checks a single metric: has the assignee acknowledged the task within 24 hours of assignment?
If not, escalating notifications fire. At 24 hours, a gentle reminder. At 48 hours, a more urgent notification. At 72 hours, the task gets flagged for management review and the accountability dashboard highlights it in red.
The accountability dashboard itself sorts all tasks by response time. Managers can immediately see who is responsive, where work is stalling, and which tasks have been sitting unacknowledged the longest. It is a simple, transparent system that keeps work moving without requiring anyone to manually chase down assignees.
The best part? Zero token cost. The entire monitor is a pure script. No LLM involved. No AI processing. Just a scheduled job that queries Asana, checks timestamps, and sends notifications. Not everything needs to be AI-powered. Some problems are better solved with a well-written cron job.
The dynamic model router
When you run dozens of AI agents executing hundreds of tasks per day, token costs add up fast. But here is the thing: most tasks do not need a frontier model. A status update does not need the same model as a complex architecture decision. A file rename does not need the same capacity as a client-facing email draft.
The dynamic model router uses a 3-stage classification system to route each AI task to the cheapest model capable of handling it:
Stage 1: Override check. Some tasks have a manually specified model. Security-sensitive operations, client-facing communications, and complex reasoning tasks can be pinned to a specific model. If an override exists, it takes priority.
Stage 2: Heuristic classification. Rules-based routing that examines the task type, complexity markers, token requirements, and historical performance. Simple tasks like file operations, status updates, and template-based generation get routed to lightweight models. Complex tasks with multi-step reasoning get routed to more capable models.
Stage 3: Haiku classifier. For tasks that do not match any heuristic rule, a lightweight Haiku classifier makes the final routing decision. It reads the task description and context, assesses the required capability level, and selects the appropriate model. The classifier itself runs on the cheapest model, so the routing decision costs almost nothing.
The estimated savings are 60-70% on token costs. That is not a theoretical projection. It is based on analyzing our actual task distribution over the previous weeks and mapping each task type to the cheapest model that could have handled it with equivalent quality.
The learning loop
The final architectural piece is the one that makes everything else get better over time. We call it the learning loop, and it is the reason this system does not just maintain its speed but actually accelerates.
When an agent completes a task, it does not just mark it done. It captures what it learned. Which approach worked. What tools it used. What unexpected obstacles it encountered. That knowledge gets written back to the platform's memory system, where future agents can read it before starting similar work.
When QA finds an issue, the feedback does not just go to the developer who made the fix. It routes back through the system so the pattern that caused the issue gets documented. Next time an agent encounters a similar task, it knows what to watch for.
When a task fails or takes significantly longer than expected, the failed pattern gets documented with a clear description of what went wrong and why. This is the negative knowledge that is just as valuable as positive knowledge. Knowing what not to do saves as much time as knowing what to do.
When a task succeeds with an approach that was novel or more efficient than the existing procedure, that approach gets captured as a skill. The skills library grows organically with every successful execution. The platform literally gets smarter every day.
The compound learning effect
This is where the architecture connects back to the compounding thesis from Part 2. Every automation you build frees up time. But every lesson the system captures makes the next automation faster to build, more reliable to operate, and less likely to need human intervention. The compounding is not just in time savings. It is in knowledge accumulation. The platform is simultaneously getting faster and smarter.
Why this matters for the whole series
Part 1 showed what was built. Twelve projects, five deployments, 10,000+ lines of code in a single day. It was an impressive list. But lists are not sustainable.
Part 2 showed why it compounds. Every automation removes a manual step from tomorrow. The curve bends upward. But compounding without control is just chaos moving faster.
Part 3 is the missing piece. The architecture that makes it sustainable.
Without HITL routing, autonomous agents hit decision bottlenecks. The CEO becomes a single-threaded connection that every query has to pass through, and the entire system stalls waiting for one person to clear a queue.
Without accountability dashboards, tasks fall through cracks. Work gets assigned and forgotten. Blockers sit unresolved for days because nobody noticed they were stuck.
Without confidence-based routing, you are forced into a binary choice. Either agents are too autonomous and they make costly mistakes because nobody reviewed the risky decisions. Or agents are too dependent and nothing moves without human approval on every trivial step. Both options are bad. The confidence threshold creates the middle path.
Without a learning loop, the platform runs at the same speed forever. You build the same automations for the same problems because the system does not remember what it already solved. With a learning loop, every day's work makes tomorrow's work faster.
This is what we built. Not a tool. Not a chatbot. Not a prompt library. An operating system for human-AI collaboration, where the architecture itself is designed to get out of the way and let both humans and agents do what they do best.
The bottom line for the entire series
One person. One platform. An architecture designed for speed and control. The output is not the point. The system that produces the output is the point. And that system gets better every single day.
If you read all three parts, you now understand something that most organizations will spend years figuring out: the real challenge of AI is not making agents smarter. It is building the orchestration layer that lets smart agents actually operate in a business context without overwhelming the humans they work with.
That is the architecture behind the acceleration. And it is just getting started.
This is part 3 of a 3-part series
Frequently asked questions
Human-in-the-loop routing automatically directs AI agent decisions to the right human at the right time. When agents can execute 100 tasks per day, the bottleneck shifts from execution to decision-making. Without HITL routing, either agents wait for a single overwhelmed approver or they proceed without oversight and make costly mistakes. Multi-lane routing creates specialized approval channels so each decision reaches the person best qualified to make it, keeping agents moving at full speed.
Every agent decision receives a confidence score. Tasks at 90% or higher confidence proceed autonomously. Tasks between 70-89% proceed with guardrails like staging instead of production deployment. Tasks below 70% confidence are routed to the appropriate human lane for review. This ensures humans only see the decisions that genuinely need their judgment while routine work flows without interruption.
The six lanes are CEO/Founder for strategic decisions, Senior Architect for technical architecture, Credentials/Access for API keys and environment setup, Desktop/Browser for local machine tasks, Final QA for pre-release checks, and HITL Manager for triaging ambiguous blockers. The CEO lane has six mandatory gate criteria that all must pass before a task enters it, including decision framing, bounded options, and risk statements. Tasks that fail any criterion are automatically rerouted to the appropriate lane.
Four specialized dashboard projects provide operational visibility: Dash HITL for multi-lane approval routing, Dash PRD for product requirements, Dash TSD for technical specifications, and Dash Proposal for the sales pipeline. Twenty-two custom fields track everything from AI1 Stage and Current State to Blocker Type, Routed Lane, Confidence, and Risk. These dashboards give both humans and agents a shared source of truth for every task in the system.
The model router uses a 3-stage system: override check for pinned models, heuristic classification for known task types, and a lightweight Haiku classifier for everything else. It routes each AI task to the cheapest model capable of handling it, estimated to save 60-70% on token costs. Simple tasks like file operations use lightweight models while complex reasoning tasks get routed to frontier models.
A script runs every hour and scans all 174+ assigned tasks, checking whether each has been acknowledged within 24 hours. Escalating notifications fire at 24, 48, and 72 hours. An accountability dashboard sorts tasks by response time so managers can see where work is stalling. The entire system runs as a pure script with zero token cost, no LLM needed. It keeps work moving without anyone manually chasing assignees.
More articles
The most productive day of my life - Part 1
One person, 10 hours, 12 projects, 10,000+ lines of code. What happens when AI agents run at full speed.
Everything I learned about agentic development on a 6-week bender
Mike Schwarz went from zero coding to enterprise-level software in 6 weeks using AI agents.