The Competence Penalty

Engineers at Meta, OpenAI, and Shopify are competing on internal leaderboards that track how many AI tokens they consume. One OpenAI engineer burned through 210 billion tokens in a single week. Jensen Huang pitched giving engineers up to half their base salary in token budgets. The trend has a name: tokenmaxxing.

It looks like enthusiasm. It’s actually a symptom.

When adoption has to be gamified, incentivized, and tracked on a leaderboard, something else is going on. Companies don’t build leaderboards for tools people want to use. Nobody gamified the adoption of Slack or GitHub. Engineers picked them up because they were obviously useful and there was no professional cost to being seen using them.

AI tools are obviously useful too. Leadership knows it. Most engineers know it. The tools have been purchased, the training has been run, and adoption is sitting around 40%. The default diagnosis is that it’s a training problem, or that people resist change, or that the tools aren’t good enough yet.

None of those explanations hold up. The tools are good. The training was fine. And engineers, as a population, don’t resist useful tools. They resist things that carry professional risk.

In 2025, researchers at King’s College London studied 28,698 software engineers at a major technology company that had rolled out an AI coding assistant. The company did everything right: pre-installed the tool on every device, integrated it into standard workflows, promoted it for twelve months. Adoption sat at 41%.

So why was this? Well they ran an experiment. Engineers reviewed identical snippets of Python code, but some reviewers were told AI was used to help write it. Same code and same quality. Output was the same too. But when they believed AI was involved, they rated the engineer’s competence 9% lower. The person became the problem, not the code.

The penalty landed hardest on women and older workers. Female-identifying engineers who used AI received significantly lower competence ratings, while their male-identifying counterparts who used AI weren’t penalized at all. (The engineers most likely to impose the penalty? Senior men who hadn’t adopted AI themselves.) Female engineers adopted at 31%. Engineers over 40 adopted at 39%. Unfortunately, the people with the most to lose from a reputation hit were the ones least likely to take the chance.

“The competence penalty”. At least that’s what the researchers called it. And once you see it, the adoption numbers stop being mysterious. Every engineer who looked at that tool and decided not to use it was making a rational calculation about what would happen to their standing if they did.

The Mandate Trap

Meta built a performance tracking platform called Checkpoint that aggregates over 200 data points per engineer, including how many lines of code are generated using AI. AI-driven impact became a core expectation for all roles in 2026. They created an internal game called Level Up where employees earn badges for hitting AI usage milestones. Top performers are eligible for bonus multipliers of 200%, with a special Meta Award paying out 300%. If you’re an engineer at Meta right now, I’m sorry, but AI is a new condition for advancement. It’s not optional.

They’re not alone. Microsoft told employees that AI is “no longer optional.”

Shopify’s CEO told teams not to request new headcount without first demonstrating that AI couldn’t do the work.

Google and Amazon have pushed similar mandates.

The logic seems airtight: you want adoption? Incentivize adoption, measure it, and tie it to compensation.

Unbelievable. Well, I guess it’s not unbelievable. It’s called tokenmaxxing, and it starts to make sense as something other than an amusing quirk. When your performance review rewards AI usage, and your company tracks it on a leaderboard, and your bonus depends on demonstrating “AI-driven impact,” you don’t just use the tools, you perform using them. You run agents overnight to rack up token counts. You optimize for the metric because the metric is what gets measured, and what gets measured is what gets rewarded. This is seriously a thing.

The problem is that none of this changes what happens in the hallway, or in a code review, or in the quiet moment when an engineer decides whether to mention that AI helped with a tricky refactor. The mandate says “use AI.” The culture says “but not where anyone can see you.” KPMG and the University of Melbourne surveyed nearly 50,000 workers across 47 countries and found that 57% of employees hide their AI use at work, and say AI-generated output is their own.

Now it’s underground.

The Calculation

Every explanation that gets offered for stalled adoption literally treats it as a capability problem: the tools aren’t good enough yet, or the training didn’t stick, or the incentives aren’t strong enough. But the tools are good, the training happened, and Meta just showed everyone what stronger incentives look like.

The engineers who aren’t adopting aren’t confused, they’ve seen what happens when AI gets flagged in a code review. “Walk me through this.” They’ve done the math on whether being seen using AI helps or hurts them professionally, and they’ve landed on “not worth the risk.”

When a CTO talks about adoption, they talk about “leverage” and “utilization”. They talk about “enablement.”

When someone decides not to use AI on a piece of work that matters, they’re asking themselves whether they’d be safe using it.

The whole thing is a trust problem.

The Conversation

The instinct after reading something like this is to go build a policy. Don’t. The problem is perception, and the only way to know how it’s showing up on your team is to ask.

Start by naming the dynamic out loud in a room, with your team, as a real conversation. “I think some of us have been calculating whether it’s professionally safe to be seen using AI, and I want to talk about that.” Most leaders are surprised by what comes out when they actually ask.

Wow. I don’t even know where this is going if AI tokens consumed means performance. And somehow, there’s a parallel problem with code reviews that penalize work that looks AI-assisted. So now there are two systems that just contradict each other. Which one feels more dangerous? The social one. The review that happens in a PR comment thread carries more weight than anything in Checkpoint, because that’s where reputation actually lives.

An engineer who uses AI thoughtfully on a tricky refactor and can walk you through every decision they made is doing better work than someone who burned through 210 billion tokens to climb a leaderboard. If the system can’t tell the difference, people already know that, and they’re behaving accordingly.

Some people are burning through 210 billion tokens to prove they’re adopting. Others are hiding their usage entirely. Both are telling you the same thing.