In early 2026, the research lab METR attempted to replicate a landmark experiment measuring developer productivity with and without AI assistance. The study could not proceed — developers refused to participate because they would not work without AI, even in a controlled research setting. This refusal itself became a data point, illustrating the deep entrenchment of AI tools in software development.
The original 2025 study had yielded a puzzling contradiction. Developers self-reported that AI made them more productive, but objective measurements showed they actually took longer to complete tasks. The extra time was spent correcting AI-generated errors, steering the models, and waiting for results. The new inability to replicate the study forced METR to pivot to a survey, which again showed a wide gap between perception and reality.
The tokenmaxxing trend
Corporate adoption of AI coding tools has given rise to a phenomenon called "tokenmaxxing" — using token consumption as a proxy for productivity. In April 2026, the Financial Times reported that Amazon had shut down an internal token-tracking leaderboard called Kirorank after employees began gaming the system by using AI agents excessively, driving up costs without delivering proportional value. The experiment proved that more tokens do not mean more productivity.
Similarly, The Information revealed that Uber blew through its entire 2026 AI budget within the first four months of the year. COO Andrew Macdonald admitted on a podcast that the spending had not led to a measurable increase in projects or productivity. Both companies, among the most technically sophisticated in the world, failed to demonstrate a return on their AI investments.
Growing code quality concerns
The deeper problem lies in code quality. Programmer and author James Shore argued in a viral post that faster code generation without reduced maintenance costs is a trap. "You write code twice as quick now? Better hope you’ve halved your maintenance costs," he wrote. "Otherwise, you’re screwed. You’re trading a temporary speed boost for permanent indenture."
Data from multiple sources supports this warning. Entelligence AI, a reliability engineering startup, claims that companies spend 44% of their AI tokens on fixing bugs that the AI itself generated. CodeRabbit, a code-reviewing tool, analysed open-source pull requests and found that AI produced 1.7 times more problems than human code. While both companies have a vested interest in selling review tools, independent research from Singapore Management University reached a similar conclusion in April 2026: AI-generated code introduces long-term maintenance costs into real software projects. The code ships faster, but bugs arrive later and maintenance debt compounds.
Industry responses
Salesforce projects $300 million in Anthropic token spending this year. CEO Marc Benioff has called for an "intermediary layer" that intelligently routes tokens between frontier and cheaper models, implicitly acknowledging that not all tokens produce value. The industry is now scrambling to build quality assurance infrastructure, routing layers, and review processes to ensure that faster code production does not become faster technical debt production.
Cognition founder Scott Wu, maker of the AI coding agent Devin, admits that the tool’s skill level lies between a junior and mid-level programmer, depending on the task. He emphasizes that it is not a hand-off-and-forget solution. Researchers recommend treating AI output like code from a junior developer: review everything, maintain strong QA systems, and keep humans responsible for architecture and security design.
The job market paradox
Companies are hiring "vibe coders" and forward-deployed engineers at unprecedented rates while simultaneously discovering that the tools these roles depend on may not produce the quality gains assumed. The AI coding market is growing faster than the evidence that it works. Developers will not go back to coding without AI — that ship has sailed. The question is whether the industry will build the systems needed to ensure that faster code production does not become faster technical debt production. Right now, the answer remains uncertain.
The evidence from Amazon, Uber, METR, and independent researchers all points in the same direction: AI coding tools offer speed but at a hidden cost. Companies that measure success solely by token consumption risk building a house of cards. Until quality assurance catches up, the productivity gains from AI will remain more perceived than real.