Trust as Infrastructure | Evan Hourigan

Six months ago, your org rolled out AI tools. Training happened, accounts got provisioned, and some teams started shipping wins. By most metrics, adoption is going fine, but “adoption” can mean different things. A handful of power users drive most of the AI usage. Beyond that, adoption is thin, but leadership sees the metrics and thinks the rollout is working.

The conversations have changed, though. Code reviews feel different, and people are careful about what they say and how they say it in ways that weren’t true a year ago. The people raising the most concerns are some of the most experienced engineers in the org.

The org enabled AI but didn’t rebuild the trust model around it. Trust has always been partly personal. When you’ve watched someone work for three years, that shapes how you read their diffs. That kind of familiarity breaks down when output increases faster than anyone can verify and the old ways of knowing who did what stop being reliable.

The missing piece is how people come to trust work they didn’t do themselves, and how the organization recognizes good judgment when you can’t just look at who did it.

The problem with trust that lives in people

Personal trust worked because of conditions that used to be easy to take for granted. Output was slower, so there was time to observe. Authorship was obvious, so reputation followed the work. Teams stayed together long enough that learning someone’s patterns paid off over years.

When those conditions erode, you can’t hire your way past it. Every new hire starts at zero trust, earned the slow way by being observed over time. Meanwhile, the people who have earned trust get more and more overloaded. You end up with the same few people in every critical review, burning out while the rest of the org waits, one departure or one vacation away from a crisis.

The weight falls on the people who built the most

The judgment experienced engineers built over years doesn’t evaporate just because output can come from somewhere other than their hands. Intuition for how systems fail and where problems hide is harder to acquire than the ability to type fast.

The work is shifting toward orchestration, directing AI-produced output, verifying it, and knowing where the risks are. A head chef shapes everything that leaves the kitchen without cooking every dish, and senior engineering can work the same way.

Most orgs haven’t built infrastructure that makes the shift recognizable. There’s no clear way to say “I shaped this, I verified it, I’m accountable for it” and have that count the same as “I wrote it.” Until that exists, experienced engineers are stuck in a system that doesn’t recognize what they’d bring.

What breaks

Teams develop their own sense of what “done” means, and nobody notices until work crosses a team boundary and something breaks. When speed gets rewarded and doubt gets side-eyed, people stop surfacing risk. They hide uncertainty, hide AI use, stop raising concerns because the social cost isn’t worth it. Postmortems happen but nothing changes because nobody owns the follow-through. Leadership tends to misread the whole thing as a tools problem or a culture problem, because the actual gap is invisible. It looks like reasonable caution.

There’s a subtler version of the same problem. AI is blurring role boundaries, and each role is starting to believe it can cover the others. Engineers ship design work that’s good enough. PMs prototype features that mostly work. When work that used to come from a specialist now comes from someone with AI assistance and general skill, the old question of “did the right team review it?” doesn’t go far enough. Someone has to know how to evaluate it, and that’s less obvious than it used to be.

Putting it in place

For risky work, it helps to write down what the team agrees on. What are you trying to do? What must not break? What could go wrong? What did you check? And how do you undo it if you’re wrong? Clear enough to verify against, light enough that people actually use it, and treated as living documentation that gets updated when the team learns something. Not everything needs this level of scrutiny though. What’s the blast radius if this is wrong, and how hard is it to undo? When everything gets treated as high-stakes, people find workarounds.

The things that get rewarded tend to be the things that get repeated, and if shipping fast is what gets celebrated, people will optimize for speed over safety. Reward decisions that hold up under pressure and problems caught early. It can take up to 2000 hours (yes, one year) of steady use before someone feels genuinely comfortable relying on these tools for production-level work, and the norms have to make room for that learning curve. Leaders usually have to go first, showing their own uncertainty and treating bad news as valuable instead of discouraging it. There’s a practical upside too: leaders who use AI as a thinking partner consistently report the highest productivity gains of any role. Going first makes you better at the job, and the team benefits from that.When engineers see their VP drafting a standards doc with AI and putting their judgment into what it says, people stop wondering whether using AI makes them look less serious. The reverse is just as visible. When leadership ships AI-generated output without reading it closely enough to strip the formatting artifacts, people who open that document draw the same conclusion: nobody thought about this before sending it. One unreviewed deliverable undoes months of credibility. Using AI visibly only works if the judgment is visible too. When postmortems lead to meaningful changes, learning compounds across the org.

None of this maintains itself. The work of keeping standards current and norms healthy needs clear accountability, whether that’s a working group, a rotating charter, or a senior leader with an explicit mandate. The infrastructure should survive any individual leaving, which is the whole point of moving trust out of people and into systems.

In practice

A cross-team incident caught by the infrastructure

An engineer uses an agent to clean up some test utilities. The agent notices that two test base classes have nearly identical names and consolidates them, aliasing one to the other. It’s the kind of cleanup a human would never think to do, which is exactly why it slips past the first review. The code looks fine and the tests pass.

A few days later, a different team’s integration tests start behaving strangely. Database state is leaking between tests in ways that shouldn’t be possible. It takes half a day to trace it back: the agent had aliased Django’s TestCase to TransactionTestCase, subtly changing how test isolation works. The tests still ran, but the transactional behavior that kept them independent was gone.

Without the infrastructure, the problem surfaces after a week of flaky tests that get blamed on unrelated changes. “Can someone let us know if they’re touching shared test utilities?” “Why wasn’t this caught?” “This is what happens when you let the AI touch shared code.”

But the team had added a custom lint rule after a similar incident: flag any import that aliases one class as another with a similar name. The agent’s PR gets flagged automatically before merge. A human looks at it, realizes what’s happening, and rejects the change with a note explaining why those two classes need to stay separate. The agent’s cleanup instinct was reasonable.

Onboarding that works without tribal knowledge

In their first months at most orgs, new hires absorb norms by osmosis, learning who to ask, what “good” looks like, what’s expected versus what’s buried in a wiki that rarely gets updated. On top of that, they’re figuring out how the team uses AI, which workflows are sanctioned, and whether asking for a second opinion on agent output is normal or a sign you’re not keeping up.

When the standards are written down in places people actually look and referenced in reviews, that ambiguity collapses. Some standards are machine-readable, fed directly to agents as context before they generate code, which dramatically reduces the issues that make it to review. New hires see the same standards the agents see. They see PRs with comments like “can I get a gut-check on this part?” and nobody treats it as unusual. Within a few weeks, they’re running their own agent workflows and shipping changes that pass review without drama, because they cleared the bar that was written down, not one they had to guess at.

A retro where uncertainty is just information

Team retro after migrating the search indexing pipeline. Someone presents their approach: “I ran three agents in parallel on different sections. One for the data model changes, one for the rollback strategy, one for the monitoring setup. Here’s where they got it wrong.”

The rollback agent completely missed that they’d need to rebuild the index from scratch if the new schema failed. It assumed they could just swap back to the old code. The monitoring agent suggested metrics the system doesn’t actually collect. Both mistakes were caught during review because the team treats agent context as code: versioned, maintained, improved over time based on what goes wrong.

Nobody flinches. Someone asks a follow-up: “Are those common patterns? Should we add them to the agent prompts for migration work?” They update the shared context that agents receive for infrastructure migrations.

The conversations only you can have

Teams stall on AI adoption for reasons that rarely get said out loud: fear of looking incompetent or being replaced, and confusion about what’s actually expected of them. These concerns need someone to say the thing that people are thinking but aren’t sure they’re allowed to say out loud.

When a leader says “I think some of us are worried that using these tools makes us look less capable, and I want to talk about that,” the room changes. People who were quietly stuck start asking questions. The thing that was blocking adoption turns out to be a conversation that hadn’t happened yet. Framing it as mandatory has the opposite effect.

Leaders who don’t make this shift tend to get routed around. Decisions start happening without them because waiting isn’t practical, and by the time they find out about problems, the problems are already in production.

If you’re wondering whether it’s working, look for social signs before metrics. Seniors aren’t the bottleneck for every risky decision, people mention AI in PR descriptions without hedging, concerns get raised early instead of surfacing as surprises, and new hires absorb the norms in weeks instead of months.

The infrastructure outlasts the current generation of tools. The specific mistakes will change as AI gets more capable, but the need for shared standards and fast feedback doesn’t go away. If anything, it matters more as the volume of output grows.

A practical takeaway

Find the last incident that crossed a team boundary. Ask whether the teams involved shared a definition of what “done” meant for the work that broke. If they didn’t, get them in a room and write one together: what must stay true, what to check before shipping, and how to undo it if something goes wrong. See how the next review in that area feels different.