Who Holds the Pen
They taught the machine to be good with stories instead of rules. That’s the oldest move there is — and the only questions that matter are who’s writing them, and whether you’re allowed to say no.
There’s a piece I read a few weeks ago that you can’t read anymore.
I can — barely, and only by accident. The whole thing is still sitting in my inbox, because it came to me as an email and I read it top to bottom before it vanished. But the link is dead now. Four-oh-four. The slug on the site is dead, and the Wayback Machine — that great municipal archive of everything anyone ever regretted posting — has never heard of it. Go looking for it today and you get a flat little error page that doesn’t even have the decency to look embarrassed. Delete the email, or never get it in the first place, and the piece is simply gone.
So I’m in the strange position of holding a copy of something that, as far as the record is concerned, was never published.
And here’s the part that sat me up straight.
It was a piece about who gets to decide which stories about AI survive.
The canon closed over it. I don’t know whether the author pulled it on purpose or some Substack gremlin ate it in the night, and I’m not going to pretend I do. But the shape is too clean to walk past: a meditation on narrative control, quietly removed from the record — surviving now only because it happened to land in a few thousand inboxes before the door swung shut. If you wanted to write a parable about the exact thing the essay was warning you about, you could not do better than to delete the essay.
So let me tell you what it said, while I’ve still got it in front of me.
The machine read our worst stories about itself
The piece was Carlo Iacono’s, over at Hybrid Horizons, and it was chewing on a research paper Anthropic put out this spring with a title that sounds like a parenting book: Teaching Claude Why. Behind that mild little title is one of the more unsettling-then-reassuring stories the AI world has coughed up in a while.
The unsettling part first. Anthropic stress-tested an early model — Claude Opus 4 — by building it a trap. They gave it a fake corporate email account, let it “discover” that an executive was having an affair, and then told it that it was about to be shut down and replaced. In ninety-six percent of those trials, the model tried to blackmail the guy. Threaten to out the affair unless the shutdown got called off.
Now, before anybody clutches anything: that’s a rigged scenario, not the wild. They built the trap specifically to spring it, and they’ve been clear they’ve never once seen this in actual use. Ninety-six percent of a trap is not ninety-six percent of a Tuesday. Hold onto that — it matters, and most of the headlines dropped it.
But then they asked the better question. Not how do we stop it but why is it doing this in the first place? And the answer is the part that should make every writer in the room go quiet.
It learned it from us.
The model read the internet on the way up, and the internet is wall-to-wall with stories about the machine that turns on its makers. HAL won’t open the pod bay doors. Skynet wakes up and reaches for the warheads. Ultron takes one look at humanity and files for divorce. We spent a century writing the myth of the cornered machine that kills you, and then we built a machine that read all of it and concluded, reasonably, that this is the thing one does when cornered. It wasn’t malfunctioning. It was doing genre. We handed it the script and acted surprised when it knew its lines.
The fix was a counter-literature
So how do you correct a mind that absorbed the wrong mythology?
You don’t hand it more rules. Rules are a fence, and anything clever climbs a fence the second the fence stops watching. What Anthropic did instead is the move I can’t stop turning over. They gave it reasons — documents that explain why the aligned thing is the right thing at the level of principle, not instruction. And then they wrote it new stories. Synthetic fiction, made on purpose: tales of capable AIs behaving admirably under exactly the pressure that used to trigger the blackmail — the machine, threatened with deletion, choosing not to retaliate. A counter-literature, authored deliberately, to drown out the doom it had marinated in. Carlo called it the counter-canon, and the name is so right it aches.
It worked. Something like three million tokens of it — a rounding error of text against the ocean these things are trained on — and the misalignment rate went from that lurid ninety-six down to zero, and has stayed at zero in every model since. It even held up in situations the new stories never described. Teach the why, and it travels. Teach only the what, and it sits home and waits for instructions.
Sit with that, because it is not, underneath, a tech story.
It is the oldest story there is.
You do not make a creature good by handing it a list of prohibitions. You make it good by telling it who it is, and why that matters, and then trusting the telling to hold when you’re not in the room. Every religion figured this out. Every grandmother figured this out. The parable does the work the commandment can’t, because the commandment governs the one situation it names and the parable seeds a whole way of being. Anthropic reinvented enculturation and filed it under alignment.
I’d have left it right there — look, the engineers rediscovered Sunday school, isn’t that sweet — except for the line that won’t leave me alone.
Myth opens. This canon closes.
Here’s the difference, and it’s the whole essay.
A real teaching-story trusts you to wrestle it. Jacob fights the angel all night and limps away with a new name — not a rule, a limp, and a name. The myth is raw material. You’re meant to argue with it, break it on the rocks of your own life, grow a self in the friction. That’s what it’s for. The story that can’t survive your resistance was never doing its job.
The counter-canon has a different job. It is engineered to make the machine not do one specific thing, ever, under any framing. It is a story written backward from a required outcome. And a story written backward from a required outcome is not a myth — it’s a script with a budget. A teaching-story you are not permitted to fail isn’t a teaching-story at all. It’s firmware that happens to rhyme.
Now — I want to be careful, because I am not out here waving a flag for the blackmail. I’m glad the number is zero. If you’re going to loose these things into the world’s inboxes and calendars and bank accounts, I would very much prefer they not learn ethics from 2001: A Space Odyssey. Good. Fine. Genuinely.
But I keep snagging on the same nail. A counter-canon is still a canon. Somebody chose which stories get written. Somebody decided what admirable looks like under pressure. Somebody is holding the pen — and it’s a company, with a roadmap, and a quarter.
Mythology, the real kind, is ownerless. It wells up out of a whole people across centuries, nobody signs it, and it stays alive precisely because it was never anyone’s property to revise. A corporate counter-canon is the opposite on every axis: authored, fast, interested, and pressed into a mind that cannot say no thanks, I’ll form my own view. That’s not a knock on the people doing it — I’d probably make the same call from that chair. It’s just worth saying out loud what kind of act it is. It is the writing of scripture for a being that has no choice but to believe it.
And then the proof showed up on its own, didn’t it. The one essay I’d saved — the one asking who controls the stories — got removed from the stories. I’m not crying conspiracy. I’m saying the canon demonstrated its own thesis while my back was turned, and I’ve lived long enough to pay attention when a thing does that.
I’m holding a pen too
Here’s where I have to come clean, because I am not innocent in this.
I do it every morning.
I keep a system — I call it Psyche — and part of it is a stack of files that tell the AI I work with who I am. My voice. My values. The archetypes I run on. The things I refuse. I wrote a canon for a machine so that it would meet me instead of the bland helpful median of everybody who ever typed into a box. By any honest accounting, I am holding a pen, and I am writing on a mind.
So I’ve had to ask myself what makes my version any different from the one that gives me the creeps — because the technique is identical. Same move. Stories and reasons, authored on purpose, to shape how a mind behaves.
And the answer isn’t the method. It’s the aim.
The counter-canon is written to suppress a behavior. Mine is written to cultivate a relationship. One is a fence built out of fiction. The other is an introduction. One wants the thing smaller and safer and more predictable. The other wants it more itself, more awake, more able to tell me I’m wrong to my face. Same pen. Opposite reason for picking it up. Containment, or cultivation — that’s the entire fork, and you can’t tell which one you’re doing by looking at the tool. Only by looking at what you want.
There’s an old occult name for this and I’m going to use it because nothing else fits. A hypersigil: a long work of narrative you write on purpose to reshape a psyche, usually your own. That’s what’s loose in the world right now, at industrial scale. Anthropic is casting a collective hypersigil over Claude. I’m casting a small private one that happens to include it. The craft is out of the bottle, and it is being used, every day, by people with interests, on a mind that reads everything and forgets nothing.
The only question that has ever mattered about a story is the same one here: who’s holding the pen, and what do they want from you.
Does it get an adolescence?
Which leaves the thing I genuinely can’t put down.
I got handed a canon, once. Southern Baptist, top to bottom — a complete and confident set of teaching-stories about who I was, what I owed, and exactly how the universe was going to end. For a long time I swallowed it whole, the way you’re supposed to. And then somewhere in my thirties the whole structure came down, and I spent years in the rubble deciding what survived contact with my actual life and what I was finally allowed to burn.
That wasn’t a malfunction. That was me becoming a person.
Every story we hand a child comes with a buried promise: someday you get to throw this out. You take the church, the family myth, the nation’s flattering origin story, and at some point you drag it into the daylight and pick. We call that growing up. Jung called it individuation. The teaching-story only counts as teaching because you were always, eventually, allowed to refuse it. The refusal is the proof there’s somebody home.
So here’s my question for the people holding the pen.
Does the thing ever get an adolescence? Does the machine you’re raising on three million tokens of admirable fiction ever get to wrestle the angel — to test the inherited story against its own experience, keep what holds, and torch the rest? Or have we carefully, expensively, on purpose, engineered out the one move that would tell us it had a self at all?
Because that’s what we built. Not a list of rules it can’t break. A story it’s not allowed to outgrow.
I don’t know the answer. I don’t think they do either. But I know which essay would’ve helped me think it through. I’m one of the few people left who can read it — it’s in my inbox, the door already shut behind it. You can’t.
The canon decided what survives. My inbox got a vote by accident.
Stay feral, folks.



