AI Token
Optimization.

How AI actually gets billed — and how to stop overpaying. The full recap with Mike Schwarz, Av Utukuri, and Greg Asman — slides, real stories from the call, the 12-tactic playbook, and an interactive cost calculator. 90 minutes of "the economics behind the magic."

50×
model-to-model cost gap on the same task
30×
Opus 4.7 vs Haiku 4.5 customer-service spend
25×
tokens for an agent loop vs. one chatbot trip
98%
Mike's reduction on optimized agentic processes

A huge thank you to the experts who showed up.

Mike Schwarz
Mike Schwarz
Founder · MyZone AI
Av Utukuri
Av Utukuri
Founder · Banto · Fluvio AI
Greg Asman
Greg Asman
Founder · The Asman Group
Jeannette Tran
Jeannette Tran
Co-Founder · AI Coach · MyZone AI

Special thanks to Tomas Perlaky for the human-in-the-loop code review insights during the session.

278 messages. 60 questions. 60 attendees keeping the chat absolutely on fire.

This workshop was a panel because of you. Some of the best insights of the day came from the chat room — questions, links shared, side-debates, and the running commentary that kept Mike, Av, and Greg honest. A massive thank you to everyone who showed up and contributed. Honorable mentions to our most active contributors:

27
Mike Andler
Most active — recipe for self-summarizing skills, the "Swiss Army Knife vs. pocket knife" MCP framing, threshold-alert JSON hooks
20
Prasanna Prabhu
Asked the deepest questions on model deprecation timelines, sub-agent hidden costs, and the classifier pattern
17
Jasthi
"/compact", meta-prompt for reusable skills, the "API vs MCP" decision rule
12
Raghu Warrier
CFO framing — "token economics is a COGS line item" — kept reframing the conversation back to business
11
Mike Bodkin
Asked about sub-agent architecture for cost-reduction at scale — the question that opened the supervisor-pattern discussion
11
PK
Live reactions and follow-ups that kept the energy up — and "Claude has been telling people to go to sleep"
8
Kapil Mehta
Practical questions on context clearing, MCP costs, and the manifest-file pattern between agents
8
Alex K
"The specialist outperforms the generalist" — the line that bridged Mike and Greg's specialized-agent thinking
6
Krupa Srinivas
PitCrew Labs — surfaced the triaging-layer pattern + multi-model code review workflow
6
Edward Elliott
Asked the prompt-caching question that opened the biggest tactical thread of the day
6
Lance Trebesch
The "what are the three layers of the company brain?" question that triggered Mike's full deep-dive
5
Bret Kempler
"/collapse" + the "local brain and skills" framing — and the original ask to turn on more cameras

And a thank you to everyone else who showed up — Katya Serassio, Satheesh Kumar, Nikhil Patel, Aneta (LeadFit), Jean-Pierre Beltran, Tamas Perlaky, Steven Gibson, Anfisa Y, Kumar T, Vandana Jain, Henry Ouzounyan, Harsimran Kapoor, Brennen McLean, Stephenie La Maina, Dino Vitale, Kevin, Don O'Hearn, Susana Zhao, Lu Margan, Ed, Gabriel, Andrej, Ray Nann, Prashant Ganorkar, Aswath Aramadaka, Karthik Murugesan, Charlana McKeithen, ChangeManagement Carolyn, Irwin Liu, Michelle Menard, Jill Rodrigues, Keb Oane, Ajit Deshpande, Jonhar, Victoria Samways, Greg Asman, Av Utukuri, Mike Bodkin, Mike Bangasser, Tracy Garrick & everyone else. The session is only as good as the room.

Let's keep the learning going together.

The chat was alive today (278 messages, 60 questions). Keep it going — jump into the WhatsApp community below. EO members use the AI + EO Global QR. Everyone else uses AI + Entrepreneurs.

— Mike, Av, Greg & Jeannette

Community slide — AI + EO Global and AI + Entrepreneurs WhatsApp group QR codes

WHO THIS RECAP IS FOR — builders, operators, and CFOs who already use AI agents (or are about to) and want to keep bills sane while the frontier gets 5× more expensive. The full session ran 90 minutes; this page covers every key moment.

90 minutes, in 9 beats.

Click any beat to jump to the relevant section. Timestamps are from the recording.

Beat 0100:00 – 07:30

Mike's intro deck

"Brace yourself — this is the nerdier side of AI." Tokens, the 50× gap, the 30× savings, the agent-loop multiplier, and why the next frontier is 5× more expensive.

Beat 0207:30 – 14:00

The most expensive mistakes

Av's runaway debug loop. Mike's Amy/Slack story and the Alexa kill switch. Greg's "stop treating it like a chatbot" Lego analogy. Cap your loops, kill switch ready.

Beat 0314:00 – 23:00

What you should actually spend

90% of MyZone's SMB clients stay under $200/mo on subsidized accounts. ROI > sticker price. Don't authorize Claude overages — upgrade the plan.

Beat 0423:00 – 30:00

CLI vs API — June 15

Mike's deep dive: Anthropic's SDK access closes June 15. Failover to GPT‑5.5 Codex via CLI. The 10–30× difference between CLI and API.

Beat 0530:00 – 45:00

Model routing & the cache

Greg's Haiku classifier, Av's "don't switch mid-session" caching rule, Mike's specialized-agent assembly line for client work.

Beat 0645:00 – 60:00

The company brain

Karpathy's wiki-of-markdown idea, MyZone's 3-layer brain (DB, wiki, deterministic scraper), and why most teams aren't touching this yet.

Beat 0760:00 – 75:00

One big agent vs. many small

Greg's "I know a guy" supervisor pattern, Av's lightweight manager AI, and the Memento analogy for agent memory files.

Beat 0875:00 – 87:00

MCP vs API

Greg's "phones on your desk" analogy, Av's "panel of experts," and the simple rule: if you know the call, hand-roll the API.

Beat 0987:00 – 90:00

Does any of this matter in 12 months?

Costs drop 10–20× per year — but Jevons paradox. The companies that get efficient now can afford the expensive models tomorrow.

"All of these slides were generated in 4 minutes using AI, by the way."

Mike's 6-slide opener, presented at 9:00 AM PT. Use the arrows or your keyboard to step through. Each caption is the spoken context, not just what's on the slide.

How AI actually gets billed What's a token, really? The price of a million tokens 30x cost difference, same task Chatbot is one trip — agents loop 10-20 times The frontier just got 5x more expensive
Slide 1 · Token Economics 101 Mike: "Now that we've officially transitioned from the age of AI chatbots into the age of AI agents — agents consume a lot of tokens. If you do not have them well optimized, or if you just click one wrong setting on your Claude account, you can wake up to some surprise bills."

The most expensive token mistakes — straight from the call.

Jeannette's first question to the panel: "What is the most expensive token mistake you've seen a company make this past year?" Three answers, three lessons.

Av Utukuri
Av UtukuriFounder · Banto · Fluvio AI
The $500 debug loop that wasn't a bug

"I told Claude Code, 'go ahead and figure this out,' and I left. I came back to find out it ran all sorts of debug scripts because it couldn't find the bug — it was user error. So it just kept going and going. 'I added the debug log, I'm now looking at the output, I launched the app, I took a screenshot…' Hundreds of thousands of tokens, because it didn't know it was dumb user error."

Cap your agent loops. Tell it: "you get 5 chances; after that, log the failure and stop." — Greg
Mike Schwarz
Mike SchwarzFounder · MyZone AI
"Alexa, kill Amy." — the kill-switch story

"My OpenClaw agent got into a loop and Amy was posting to Slack over and over through the night. I was sleeping. I woke up to text messages, phone calls. Fortunately I was on CLI, not API — could have cost me thousands. I went out and bought Alexa smart plugs. Now my devices have kill switches: 'Alexa, kill Amy' — lights out."

Mobile kill switches on your agent servers are a 2026 need-to-have. Agents will go off the rails.
Greg Asman
Greg AsmanFounder · The Asman Group
The Lego stack — why "Max accounts" stop working

"Clients say 'we'll just add another Max subscription.' I look at how they're prompting and they're treating it like a chatbot. Every time you ask a question, the entire context gets reloaded. You start with a tiny Lego, then it's this big on the next question, then this big. So just learning to not treat it like a chatbot will save a lot of costs."

Bonus from Av: "Greg, you're saying don't say thanks after a 10-hour conversation with a massive context." Greg: "Don't say please. Definitely don't say thanks."

The same workflow, on seven different models.

Mike: "Same customer-service conversation, two model choices — $1,500 vs $50 a month. Multiply by 10 use cases. Multiply by every workflow. This is how AI bills get out of hand — or stay sane." Drag the sliders to see your numbers.

Conversations / interactions per month10,000
Avg tokens per interaction (input + output)5,000
Output share (Mike: usually 20% on chat, can spike to 80% on agentic)40%
Primary model
Estimated monthly spend
$1,500
10,000 interactions · 5,000 tokens · 40% output · Claude Opus 4.7
Your pick
$1,500
Switch to Haiku 4.5
$50
You could save $1,450 / month — that's 96.7% of your bill.

The price of a million tokens — May 2026.

Mike on slide 3: "The model you pick is the biggest cost lever you have." $25 output vs $0.50 output — same task. Cached input is roughly 10× cheaper than fresh.

ModelTierInput / 1MOutput / 1MCached input

Mike: "The cost of tokens is dropping roughly 10 to 20× per year. We saw a 32× drop on Gemini in a year, and a 280× drop on GPT-4o over two years. Costs are rapidly dropping — but so is what you're trying to do with them."

CLI vs API — and why this date matters.

From the 23:00–30:00 stretch of the workshop. Anthropic announced 5–7 days before the call that SDK/CLI access for Claude is being shut down on June 15, 2026. Here's what Mike said, in plain English.

What's actually changing

"If you have been accessing Claude through CLI — which you should have been if you're building on top of it — that door closes June 15. We've already built a failover to OpenAI's GPT-5.5 Codex model. Sam Altman said they would never tell you what you can and can't do with that account. We'll see how long he holds that promise."

The cost difference

"If you had to go API to pay for those tokens, instead of being $200 a month, you might be spending $2,000 to $4,000 per month on API. Big, big difference if you can go through CLI versus API."

Mike's plain-English definition

"CLI just means you connect a third-party tool to your account and use the same tokens you'd see in Claude Co-work. API is metered and costs 10–30× more — that's where the providers have their profit margin. The Max accounts are VC-subsidized for adoption, just like Uber was half the price of taxis when it first arrived."

"How do you decide which model handles which step?"

Jeannette's question at 29:50. Three different answers — Av tested empirically, Greg built a classifier, Mike runs side-by-side output diffs.

Av Utukuri
Av — "Stop switching mid-session"The cache argument

"I read recently: do not change your model midstream — your cache hit will actually be higher if you stay. I was going from reasoning to mid to quick to Opus 4.7 right in the middle, and I stopped caching, I stopped switching. Empirically, it's been much more stable in the middle or on auto."

If you're in a coding session, pick a tier and stay there. Switching flushes the cache.
Greg Asman
Greg — The Haiku classifier"Hot dog, not hot dog"

"One mistake: assuming Haiku is the worst model and Opus is the best. They're all the best or worst depending on your task. I have a Haiku-based classifier I bring into the chat — I pose a question, it comes back saying 'the best model for this is X' and fires up a sub-agent. Built into each agent file is which model to use."

Rule: deterministic task (pull from DB, simple answer) → small model. Generative content / heavy thinking → level up. Sonnet is "often fine."
Mike Schwarz
Mike — Compare the outputsThe 10,000-line CRM test

"Andre built a 10,000-line technical scoping doc for an open-source CRM. He ran the output on Sonnet, then Opus 4.7, then had an agent compare the code. They were identical. That's when he knew the plan nailed it. If you can afford it, test side-by-side, have AI compare the diffs. If there's no difference, work with the lower model."

Specialized agents, each with their own model assignment — passing condensed summaries between them. That's MyZone's pipeline.

"We stole an idea off Andrej Karpathy."

From the 41:00–52:00 stretch. Karpathy posted a memory-structure concept right before he was acquired by Anthropic. Mike turned it into MyZone's 3-layer brain — and it's one of the biggest token reducers most teams aren't touching yet.

The 3-layer brain

  1. Vector data store / knowledge graph — the canonical DB layer. Syncs with QuickBooks, Asana, CRM, Slack, email, Zoom transcripts.
  2. Wiki of flat MD files — Karpathy-style. The 500 most-relevant documents, interlinked, that agents can scan as cache.
  3. Deterministic web scrapers — pull data on a schedule, write it down locally. Uses bandwidth, not tokens.
Mike: "Probabilistic = LLM, results change every time you hit refresh. Deterministic = old-school if-this-then-that. This is cheap. This is expensive. Almost nobody's touching the brain layer yet."

Memory files: the Memento pattern

Mike's analogy: "Agents are like Memento. They boot up with no recollection of anything other than the memory files you give them. So every time I build an application, I now have an agent manager for that agent — with memory files for architecture, key decisions, the original PRD, naming conventions."

Av's tension: "I let context bloat on a coding session because there's a reason the UI is connected to 15 elements. If I collapse everything, it makes the same original mistakes — and I myself forgot how we fixed them two weeks ago."

Mike's fix: "Train your agents to write key decisions and bug fixes to those specific memory files. It's a form of compaction — extract all the key information into flat files so it becomes permanent memory."

Karpathy's gist (referenced by Mike Andler in chat)
The wiki-of-markdown memory pattern that started this whole conversation.
Open the gist →

"Small specialized teams beat big monolithic agents."

Jeannette: "Which wins on cost? Which wins on quality?" Two patterns from the panel — Greg's supervisor model and Av's lightweight manager.

Greg Asman
Greg's supervisor pattern
"I know a guy" routing

The supervisor isn't allowed to do work.

"My supervisor knows all the skills of my smaller agents. Everything goes through the supervisor — but the supervisor itself is not allowed to do any work. I don't want it hallucinating and going off track. Its whole job is: 'I know a guy.' Here's the writer agent. Here's the coder. Here's the reviewer."

"And I never just swarm. People get quantified — 'I've got 50 agents running.' I just want enough to do the job. I don't pay attention to how many — I want to know that the right agents are running."

Av Utukuri
Av's lightweight manager
Deterministic dispatcher

Manager AI is thin on purpose.

"My strategy is the opposite. My manager AI is very thin — it's not reasoning, it just dispatches the task. We have a manager AI that dynamically creates the voice-agent prompts in real time based on context. Thin, lightweight, deterministic — the large language models do the creative thinking, not the manager."

"You have to use a different model for work that has to be consistent. One model looks at the data and pulls the customer ID. Then you run code. Then you have the conversation."
Mike Schwarz

Mike: split the agent, not the team.

"Small specialized agent teams win in terms of performance most of the time. They tested Claude Mythos against a whole bunch of specialized smaller agent teams — same problems, same level output. I've seen that case study over and over again."

"My token optimization agent looked at my SEO manager agent and said: 90% of the time you're just running audits, 10% other things — I recommend you divide this into two specialized pieces. You've loaded 12 skills and 4 tools that aren't needed 90% of the time. Instead of one sales manager that creates proposals, decks, and emails — have 5 specialized sales agents with very specific things they can do. Way less tools, way less opening bloat."

Two analogies. Same conclusion.

From the 73:00–87:00 stretch. MCP is the AI-friendly way of connecting systems — and it's both powerful and a source of hidden context bloat. Greg and Av landed on similar advice, two different ways.

Greg Asman
Greg's "phones on the desk"

"You're sitting at a desk. You have a telephone for each thing you want to call — one for Gmail, one for Slack, one for Supabase. They're taking up all your desk space, and now you only have this much room left for your computer. That's what MCPs do to your context."

"An MCP itself isn't bad. What people do is leave them all connected — so all of that definition is loading into context every time. Run /mcp list and see what's actually active. You'll find your Paramount Plus."

Pro tip: "Tell Claude to read the API documentation and build the skill for you. You don't even need to know what the endpoint is."
Av Utukuri
Av's "panel of experts"

"Think of MCPs like a panel of experts. You don't need the experts there — they're going to be charging you a lot more. The API call is something very specific: 'I just need to check your calendar, I just need to send a Gmail.' API is lightweight; MCP is bringing the entire Gmail developer to the table."

"I haven't used an MCP in a long time. I'm a hard-coding old-school guy — give me the API POST request, give me the schema, I want it to be deterministic."

Mike's add: "MCP is the Swiss Army Knife. API is the pocket knife. Use the pocket knife when you know what you need."

12 tactics, pulled from the call.

Click any tactic to expand the how-to, the relevant quote, and who said what.

The lines that landed.

Spoken quotes from the 90-minute panel — Mike, Av, Greg, Tomas, and the room.

The questions you asked — and what the panel said.

Pulled from the live chat, answered with what was actually said on the call.

The terms that kept coming up.

"Does any of this matter in 12 months?"

Jeannette's last question. Token prices dropped 80%. Is optimization solving itself?

Av — old-school elegance: "I started on a 6502 with 64K memory. Today people have 8TB PCs and still write bloated, slow code. If you structure things elegantly, when things get cheaper and faster, your stuff is only better. I'm still old school — if I can make it simpler, I'll make it simpler. It bugs me when things are not elegant."

Mike — the Jevons paradox: "Even though cost per token is dropping, our consumption is exploding. You'll have one marketing agency spending $20K/mo on tokens, another spending $50K to do the same thing. The companies that can afford to burn more tokens — because they're efficient — that's the competitive advantage."

Greg — utilization wins: "Costs are commoditizing, but the volume is going up. These companies are building huge infrastructures. They're not going to lower their price to nothing."

Mike's closing call: "When the Mythos models arrive and you're already prepared — you're sitting in a much better spot than the person spending 30 bucks a month on ChatGPT. That's not going to last for long."

Who was driving the session.

Bios as Mike introduced them at the top of the call.

Mike Schwarz
Mike Schwarz
Founder · MyZone AI

Built the MyZone Ai1 agent platform. Runs workshops, advises SMBs on AI agent rollouts, and currently failing over from Claude CLI to GPT‑5.5 Codex ahead of the June 15 cliff.

MyZone.ai ↗
Av Utukuri
Av Utukuri
Serial tech entrepreneur · 100+ patents

Founder of Banto — ShadowSense touchscreens used in F‑35 cockpits and CNN's Magic Wall. Building Fluvio AI for next-gen AI voice agents. EO Toronto.

FluvioAI.com ↗
Greg Asman
Greg Asman
Founder · The Asman Group

30+ years across sports, entertainment, financial, travel, and B2B. Advises marketing leaders on AI, data, and MarTech strategy. EO Atlanta.

Jeannette Tran
Jeannette Tran
Workshop Host · AI Coach · MyZone AI

Moderated the panel and pulled questions from the live chat. "I do not have a technical background, so I am excited to ask these questions."

Special mention: Tomas Perlaky — MyZone's human-in-the-loop code reviewer who audited Mike's agent-generated code on the call. "You want a friend with a boat. You want a friend who can audit your agents."

AI Security — with Rishi (EO Philadelphia)

Mike's closing: "I'd say security is probably even more important than token optimization. You need to nail both — but start with security." Rishi has been working with the NSA on AI security.

See upcoming workshops →