AI and the externalization trap

Koshy John’s recent article, “A.I. Should Elevate Your Thinking, Not Replace It,” made the Hacker News front page and resonated with a lot of engineers. The core thesis is right: AI is splitting the profession into people who use it to think better and people who use it to avoid thinking. I agree with the analogies and the warnings about simulated competence.

But John writes from an engineering management perspective. He describes what the split looks like from above. I’m a senior SRE at big tech trying to apply this advice in production. From the inside, it’s harder than it sounds because, although choosing to elevate your thinking isn’t easy, the altitude keeps changing.

Where the advice breaks down

Let me be specific about what the article gets right. Simulated competence is real and spreading. I see it in pull requests where the code looks clean, handles errors properly, includes tests, and is still subtly wrong in ways that only show up under load or during a failover. The juniors-at-risk argument matches what I see. I train new hires, and the ones who leaned on AI through school have a gap in their debugging instincts that’s hard to close after the fact. And the central claim, that judgment is the product and code is only its output, has been true since I started in 2001.

Where the article falls short is in three areas that anyone trying to follow this advice will run into.

The externalization trap

John argues that the most valuable engineers will be “the ones generating the knowledge that makes AI more useful in the first place,” creating design principles, domain understanding, and decision frameworks that improve the machine’s effectiveness. This is good advice for the organization, and it can backfire on the individual who follows it, because the two don’t have the same interests here.

There’s a pattern here that predates software and played out over decades. When CNC machines automated machining, the most sought-after person on the shop floor was the senior machinist who could write the programs, the one who understood feed rates, material properties, and tolerances well enough to translate craft knowledge into machine instructions. Many of those machinists did exactly what John recommends: they generated the knowledge that made the automation more useful.

Many of them eventually got laid off because the programs were done. A less experienced operator could load stock and press the start button. The knowledge had been extracted, productized, and the machinist was now redundant. The company suffered for it later when novel problems showed up and nobody understood why the programs worked, but that was the company’s problem. The machinist was already gone.

This is the externalization trap. Not all knowledge artifacts are equal. The test is whether the artifact still needs you after it ships. Post-incident analyses, architecture decision records, and blog posts don’t; they build your reputation while your judgment stays yours. Complete diagnostic runbooks, push-button automation, and decision-tree tools extract that judgment and package you for replacement.

I now ask one question before building any internal tool: could someone who doesn’t understand why this works operate it successfully for two years? If the answer is yes, I’m automating myself out of a job. If no, the tool still requires my mental model to interpret results and decide next steps, which makes it a force multiplier.

The distinction matters for how you use AI in your own workflow too. AI doing data reduction and pattern-matching on data you already collected makes you faster. You point it at a 50,000-line trace file and say “group these syscalls by file descriptor, show me the five longest blocking calls.” It saves you 20 minutes of grep and awk. But you’re still the one who looks at the output and says “that’s a connection to the database replica, and it’s blocking for 800ms, which means replication lag is the problem.” When AI interprets the results and decides on your behalf, I’ve handed off the part of the job that’s actually mine. That’s the line I try not to cross.

The treadmill problem

John writes as if “elevate your thinking” is a state you reach and hold. I’ve never reached it. Every time I think I’m close, the belt speeds up and I’m running to hold position.

Three years ago, knowing how to trace a performance issue through the kernel’s block I/O layer was deep expertise. You needed to understand the storage stack, know which tracing tools to use, and interpret the output against a mental model of how the system should behave. Today, an AI agent with access to bpftrace documentation can generate the right tracing script. The expertise didn’t disappear, but it shifted: from “can you write the script” to “can you interpret the output and form a hypothesis that nobody has formed before.”

Two advantages still hold: being closer to production data, and forming a hypothesis in 30 seconds because you’ve seen the pattern. Both are temporary. MCP servers, persistent agents, and observability tooling that feeds system state into model context erode the first; an agent watching your metrics continuously erodes the second. Give it five years.

So what’s actually durable? Two things consistently hold up.

The first is accountability. Organizations structurally need a human who decides under uncertainty and owns the outcome. When the system is down and costing $2M per hour, someone has to call the failover, approve the rollback, and explain what happened to the VP afterward. AI can recommend each of those actions, but your name still goes on the post-mortem. The human in the loop exists because someone has to carry the consequences.

The second is novel failure diagnosis. The incidents that actually threaten systems are unprecedented combinations of known components failing in unexpected ways. I once found a service pinned at 100% CPU. Application load looked normal; the real cause was hypervisor steal time on the underlying host, something that didn’t show up in application metrics and wasn’t in any runbook. No AI would have found that from documentation or generated code alone. It required strace, profiling, and a mental model of how the system should behave versus how it was behaving. Novel failure is definitionally outside training data, which makes it the strongest technical moat available. But the moat still erodes.

What other professions already learned

Software engineers tend to think this is a new problem, but aviation, law, finance, and machining all followed the same arc.

Automation absorbed the middle of the skill distribution: the routine-but-skilled work. The bottom got deskilled (machine operator, document reviewer, automation monitor). The top got more valuable because they handle the exceptions automation can’t. And then the pipeline between bottom and top broke, because the middle was where people built the judgment needed to reach the top.

Aviation is the closest parallel. Pilots went from stick-and-rudder operators to automation supervisors. Then the industry discovered the automation paradox: the better the automation gets, the less practice humans get at the underlying skill, and the worse they perform when automation fails, which is precisely when human skill matters most. Regulators responded by requiring manual flying hours to maintain proficiency. The industry recognized that you can’t let the ladder rot while expecting people to climb it.

Law went differently. E-discovery AI obliterated the junior associate tier that used to do document review. Firms now struggle to develop senior lawyers because the juniors skip the formative grunt work where judgment was built.

Software is at exactly this point in that arc. Unlike aviation, no regulatory body is mandating “manual coding hours.” Software lacks the immediate-fatality stakes that produced those rules, so the choice falls to individuals and firms. Left to default, the market hollows out the middle.

So what do you actually do

I don’t have a ten-year plan. I don’t think anyone can credibly claim to have one right now. Here’s what I’m doing instead.

Technical skills are worth investing in, but on a 3-5 year depreciation schedule. I’m spending time on eBPF, performance profiling, and understanding how AI systems actually work at a mechanical level, because knowing where a tool breaks is more durable than knowing how to use it. But I’m not treating any specific skill as permanent career infrastructure.

External visibility is insurance. I work remotely for a large company. My expertise is invisible to anyone who hasn’t seen me debug a production incident at 3 AM. If a VP who’s never watched me work decides the company needs fewer senior SREs, my external reputation is the only thing that changes that from a painful job search into recruiters coming to me. Writing about what I investigate (even one technical post per month) compounds regardless of how AI evolves.

Cash is the most tangible buffer available. No career strategy eliminates risk entirely. Having 12 or more months of expenses saved converts worst-case scenarios from existential threats into logistics problems.

And finally: a fixed reassessment cadence. Every six months, I ask myself what AI can do now that it couldn’t six months ago, whether that changes what I should invest in, and whether I’ve been building skills that compound or just coasting. Between checkpoints, I execute without second-guessing the strategy. This converts ambient anxiety about the future into a scheduled review with specific questions.

The altitude keeps changing

John’s article is correct that the divide is real. What it undersells is that staying on the right side of it is an ongoing, active process. There is no stable endpoint where you can say “I’ve elevated my thinking enough.” The floor keeps rising because skills that are deep expertise today become commodity prompts next year.

The best response I’ve found is to adjust your posture instead: invest in what works now with clear eyes about its shelf life, keep the escape routes open, and reassess every six months.

That’s less reassuring than “elevate your thinking,” but it’s closer to what the work actually feels like to me from the inside.