Skip to content
Back to Blog

Two Agencies Built the Same AI System Without Ever Talking to Each Other

AgencyBoxx Team
Two Agencies Built the Same AI System Without Ever Talking to Each Other

We had a call recently with another HubSpot agency owner. Someone we had never met before, operating in a completely different market, serving a different client base, running a different size team.

Within 10 minutes, we realized we had independently built almost the exact same AI operations system.

Same agent types. Same use cases. Same architectural decisions. Same trust boundaries. Same "I don't let my agents send emails" policy. Even the same instinct to name their agents after fictional characters so the team could keep track of who does what.

This was not a coincidence. This was convergent evolution. And it tells you something important about where agency operations are heading.

The Lineup Was Nearly Identical

Before this call, we had spent months building out our agent roster inside OpenClaw. Nine purpose built AI agents, each handling a specific operational function that was eating our time. We assumed our setup was unique because we had never seen anyone else in the HubSpot ecosystem doing this at our scale.

Then this other agency owner started walking through their agent list:

Time tracking enforcement. They had one. We had one. Both agents monitor the team's logged hours, flag gaps, and chase people down over Slack DM when entries are missing or descriptions are blank. Both of us had built this because project managers were spending 15+ minutes every single day manually checking who had logged time. Multiply that by 250 working days and you are burning 60+ hours a year on a task a script can do better.

Email triage and personal assistant. They had one. We had one. Both systems classify incoming email, filter spam, detect newsletters, auto label by category, and generate draft replies that go into a review queue. Neither system sends anything automatically. Both require a human to review and approve before anything goes out.

Sales prospecting and contact enrichment. They had one. We had one. Both agents pull prospect companies from defined sources, enrich contacts through services like Hunter.io, validate email addresses, discover LinkedIn profiles, and organize everything for outreach. Our system had processed 7,300+ prospects and found 2,880+ contacts. Theirs was on a similar trajectory.

Project management and account oversight. They had one. We had one. Both monitor project activity, create task summaries from meeting transcripts, push follow ups back into the project management system, and flag when things are falling behind.

File and asset management. They had one. We had one. Both organize client deliverables in Google Drive, ensure folder structures stay clean, and keep documentation properly filed.

Personal assistant and calendar management. Both of us had built agents to handle scheduling logistics and daily briefings.

QA and monitoring. Both of us had agents checking the output of other agents and systems, validating that work was done correctly before it reached a client.

We are talking about two completely independent implementations, built months apart, by people who had never exchanged a single message. And the overlap was not 50%. It was closer to 90%.

Why This Keeps Happening

Agencies look different on the surface. Different niches, different team sizes, different tech stacks, different client expectations. But underneath all of that, every agency that runs on HubSpot and serves multiple clients simultaneously hits the same set of operational walls:

Shared inboxes grow faster than humans can triage them. Whether you have 10 clients or 75, the email volume eventually outpaces the team's ability to stay on top of it. SLA clocks start ticking the moment a message lands, and the first sign of trouble is usually a client escalation, not an internal catch.

Time tracking is a daily battle. Every agency owner we have ever spoken to has the same complaint: the team forgets to log time, leaves descriptions blank, or lets timers run overnight. This creates billing gaps, inaccurate budgets, and awkward client conversations. And every day, someone on the team (usually the most expensive person) spends time chasing the rest of the team to fix it.

The founder is the bottleneck. The person who should be selling, strategizing, and building relationships is instead spending hours on email triage, meeting follow ups, and operational oversight. This is true whether the agency has 5 people or 50.

Prospecting never happens because delivery always takes priority. Outbound sales requires sustained, repetitive effort: finding contacts, enriching data, validating emails, personalizing outreach. But when a client deadline is looming, prospecting is the first thing that gets dropped. Every time.

Institutional knowledge is trapped in people's heads. SOPs live in Notion pages nobody reads. HubSpot documentation is scattered across help articles. Onboarding a new hire means weeks of tribal knowledge transfer that could be handled by a searchable knowledge system.

These problems are not unique to any single agency. They are structural. They come with the business model. And when two experienced operators independently set out to solve them with AI agents, they arrive at nearly the same architecture because the problems are the same everywhere.

The Trust Boundary Was Identical Too

The most striking overlap was not the agent lineup. It was the trust model.

Both of us had independently arrived at the same hard rule: AI agents draft. Humans approve. Nothing goes out without a checkpoint.

When we asked the other agency owner about their approach to outbound communication, their response was immediate and blunt. No hesitation. Every agent creates Gmail drafts. Every draft gets reviewed by a human. No agent has permission to send anything to a client autonomously.

This was not a decision either of us made because we read it in a best practices guide. We both arrived at it because we understand what a single wrong email can do to a client relationship. When you are a white label agency or you are managing someone else's HubSpot portal, a misrouted message or a hallucinated response is not an embarrassing mistake. It is a relationship ending event.

The human in the loop pattern is not a limitation of the technology. It is the entire point. The value of AI agents is not replacing human judgment. It is eliminating the 45 minutes of prep work that happens before a human makes a two minute decision. Draft the email. Prepare the report. Stage the response. Then let a human hit send.

Anyone telling agencies to remove humans from the loop entirely is selling something they have never had to stand behind with real client relationships on the line.

Both of Us Named Our Agents

This is a small detail, but it stuck with us.

Both agencies had independently decided to give their agents human readable names instead of functional labels. We went with characters from 1980s Tron. They started with Transformers, then switched to human names when the roster grew beyond what they could remember.

It sounds trivial. It is not.

When you have 15 or 20 agents running and something breaks, "the time tracking agent has an error" is far less useful than a specific name that everyone on the team instantly recognizes. Naming agents creates a mental model. The team starts thinking of them as colleagues with defined responsibilities, not abstract processes. It makes the whole system easier to reason about, easier to troubleshoot, and easier to onboard new team members into.

If you are running more than three or four agents, name them. Pick a theme. Your team will thank you.

The Architectural Split: Scripts vs. SOPs

The one area where we diverged was in how much decision making we give the AI.

Our approach leans heavily on scripted Python. When we build a new agent, we start in Claude Code and build out the core functionality as deterministic scripts. API calls, data formatting, routing logic, compliance checks: all scripted. Then we bring AI in surgically for the parts that genuinely require reasoning, natural language understanding, or content generation.

The other agency took a different approach. Their agents operate primarily through AI driven standard operating procedures. The agent receives an SOP that says "this is how we do X, this is what happens after a meeting ends, this is the next step." The AI interprets and executes against those instructions, making more autonomous decisions within the boundaries of the SOP.

Both approaches work. Both have tradeoffs.

The scripted approach costs almost nothing to run (we operate 20 agents across three OpenClaw instances for about $1/day in AI credits) because the agents are making fewer AI calls. The tradeoff is that scripted agents cannot easily learn from their own mistakes, because they are making fewer independent decisions.

The SOP approach gives agents more flexibility and allows them to self assess (the other agency's agents write daily self assessments documenting what worked and what failed). The tradeoff is higher token costs and a wider surface area for unexpected behavior.

There is no single right architecture. But if you are burning through AI credits and wondering why, check how much decision making you are outsourcing to the model versus scripting deterministically. We learned this lesson the hard way when our first pure AI approach burned through $150 in four hours.

What Convergent Evolution Tells You

When two experienced operators, working independently, arrive at nearly identical solutions to the same problems, that is signal. It means the problems are real, the solutions are proven, and the pattern is repeatable.

It also means the window is open. Right now.

The tools are available to everyone. OpenClaw is open source. Claude Code can handle most of the setup. A Mac Studio with decent specs can run the entire operation. The technical barriers have never been lower.

But 95% of agency owners are not going to sit down and build this. Not because they cannot. Because they are running agencies. They are on client calls, chasing invoices, hiring, and putting out fires. The tools are commodities. The implementation is the differentiator.

That is the gap AgencyBoxx was built to fill. We spent 200+ hours and 12+ months building, breaking, fixing, and refining a production AI operations system on 75+ real agency clients. We did the work so that the next agency does not have to start from zero.

But whether you build it yourself or bring in a system that is already battle tested, the takeaway from this call was clear: the agencies that figure this out first will operate at a fundamentally different level than the ones still doing everything manually.

The problems are universal. The solutions are converging. The only question is how long you wait to start.

AgencyBoxx is an AI operations platform built inside a real agency, not a lab. Book a Walkthrough to see the system running on live client data.