The Verification Bottleneck: Why AI’s Real Cost Is Human Attention
AI scales execution to near-zero cost. But verifying that output stays biologically bounded. The bottleneck isn't intelligence anymore. It's human verification bandwidth.
A new paper from MIT and Washington University frames the AI transition as two cost curves racing in opposite directions. The cost to automate falls exponentially. The cost to verify stays where it’s always been: bounded by human cognition.
The binding constraint on growth is no longer intelligence. It’s human verification bandwidth.
AI can generate a 50,000-line application in a day. A design document in minutes. A legal brief in seconds. Execution cost is approaching zero.
But someone still has to check whether the output is correct. Whether the code handles edge cases the model didn't think about. Whether the legal citations actually exist. Whether the billing logic accounts for the dedicated carrier program that nobody wrote down.
That checking happens at human speed. Reading speed. Context-building speed. Domain expertise speed. None of these scale with compute.
The Faros AI study across 10,000+ developers put numbers on it: teams using AI complete 21% more tasks and merge 98% more PRs. But PR review time goes up 91%. PRs are 154% larger. Bug rates climb 9% per developer. The work didn't disappear. It moved from writing to reviewing.
Full delegation requires trust!
The verification problem is really a trust problem wearing a technical hat. You trust a colleague to run a project because you've seen their judgment over the years. A doctor trusts a resident's diagnosis after watching hundreds of cases together. A manager trusts a direct report's analysis after months of calibration. That trust was expensive to build. It doesn't transfer to a model that hallucinates between 0.7% and 94% of the time, depending on who built it.
This framing came up in conversation with a colleague at work. Thank you Francisco Arceo for sharpening the thinking here.
The Stack Overflow 2025 survey makes this concrete: 84% of developers use AI tools, but only 33% trust the output. A 51-point gap between adoption and confidence. The more people use AI, the less they trust it. Experienced developers trust it least.
Additionally, the number of agents producing output grows with compute. The number of humans available to verify stays fixed. Every new agent, every new workflow, every new automation draws down the same finite pool of human attention.
The measurability gap
The paper introduces a concept called the "measurability gap." Quantifiable tasks are automated first. What's left are the tasks that require judgment, context, and liability: what the authors call n-hard or n-legal processes.
The dangerous part isn't that AI produces wrong answers. It's that the wrong answers look right. CIO.com put it well: "Almost-right code is insidious. It compiles. It runs. It passes the basic unit tests. But it contains subtle logical flaws or edge-case failures that aren't immediately obvious." Finding what's missing in 100 lines of AI-generated code is harder than writing the 100 lines yourself.
The METR randomized controlled trial found experienced developers using AI tools took 19% longer than without AI. Before the study, they predicted AI would make them 24% faster. After the slowdown, they still believed it had sped them up by 20%. That's a 39-point gap between what people feel and what actually happened.
The HBR study from UC Berkeley tracked 40 workers over 8 months and found something similar: in the moment, people described momentum. When they stepped back, they described feeling busier, more stretched, less able to disconnect. 62% of associates reported burnout. AI didn't reduce work. It intensified it.
Hollow economy vs. augmented economy
The paper's central warning: without verification infrastructure, the market drifts toward what they call a "Hollow Economy." Measured activity explodes. Human control hollows out. GDP goes up. Understanding goes down.
The alternative is an "Augmented Economy" where verification scales alongside automation. That means treating verification as a production technology, not a compliance checkbox. Cryptographic provenance, liability underwriting, evaluation records, audit trails. The ability to insure outcomes, not just generate them.
One failure mode I keep thinking about: the expertise decay loop. Routine tasks automate, entry-level positions disappear. Those positions were where future expert verifiers got trained. The system slowly undermines its own ability to check itself.
Where this leaves us
intent -> execution -> verification. Humans set intent. Machines execute. Humans verify and take responsibility.
Compute gets cheaper every quarter. Human attention does not. Every organization deploying AI at scale is discovering this firsthand. Not from reading papers, but because their review queues are backing up, their senior engineers spend more time reading generated code than writing their own, and their confidence in what shipped last Tuesday is lower than it was a year ago.
The real cost of AI productivity isn't compute. It's the attention of the people who still have to decide whether the output is worth trusting.


