TLDR: This is an argument for AI dramatically increasing biological, nuclear and other risks.
Aum Shinrikyo’s Missing Ingredient
I think about Aum Shinrikyo often in the context of why don’t atrocities happen more often?
By the early 1990s, the cult had thousands of dedicated members, PhD chemists, senior medical staff, billions in cash, and a custom laboratory in the foothills of Mount Fuji, well outside the gaze of the Japanese authorities. They believed that they were going to kill millions of people and bring about the end of the world. This let them synthesize multi-kilogram batches of sarin. They also worked on VX, anthrax, and botulinum.
They tried, repeatedly, to deploy what they had built, and mostly failed.
The March 1995 attack on the Tokyo subway killed roughly a dozen people and injured several thousand. A tragedy, but only a small fraction of what the cult was working toward. Their bioweapons program produced no successful attack at all. The consensus is that they couldn’t get their knowledge into working form. They had the formal knowledge, textbooks, chemists… but not the help of people who have actually previously done it. They failed on things that people who are an experts in a field don’t even think about explaining.
The Malice-Competence Gap
The intuitive answer is to why terrorism is hard is that most people are not trying to commit atrocities. That is correct, but not useful.
The more interesting answer is that of the small fraction of people who are trying, almost none of them succeed. The reason almost none of them succeed is a kind of happy accident. The desire to cause mass harm seems negatively correlated with the ability to do so. You can want it as hard as Aum Shinrikyo did, with their billion dollars and their laboratory and their PhDs, and reality will still say no. I consider this the ‘incompetence firewall’.
“Incompetence” here is not meant that bad actors are all stupid, only that reality has historically required of them a kind of competence they have not been able to muster.
The firewall has two layers, and they fail differently.
The first is the malice–competence gap. The will and the skill are weakly correlated, and at the high end of intended harm they may be negatively-correlated. The kind of person who finishes a PhD in chemistry is not tends to not be the kind of person plotting to kill thousands. The venn-diagram of those participants luckily doesn’t have much overlap.
The second is the accomplice bottleneck. Even where a person is in the venn-union above, a single person is rarely sufficient. Most mass harm requires a supply chain of knowledge, materials, troubleshooting, coordination, execution… every link in that chain has required either a rare personal skill or another human being.
Even Aum, with all its resources, ran into both layers. Their resident chemists were good but not good enough, and the program would have needed external people to scale.
[To Avoid the Red Herring]
Before I make any further claims, I want to disavow one thing the post is not about. The argument I am about to make is not “AI will give bad people dangerous information.” That is a different, older, and weaker argument. The Anarchist’s Cookbook has been freely available since 1971.
The firewall is tacit knowledge. The argument is that AI is the first technology aimed squarely at that.
Tacit Knowledge
There is a paper by Donald MacKenzie and Graham Spinardi, published in 1995 in the American Journal of Sociology, titled “Tacit Knowledge, Weapons Design, and the Uninvention of Nuclear Weapons.” [https://gwern.net/doc/radiance/1995-mackenzie.pdf] Its central argument is that the knowledge required to build a nuclear weapon is, in important ways, not contained in the documents that describe how to build a nuclear weapon. The papers and blueprints and design specifications are necessary but not sufficient.
One anecdote is about the US warhead program: certain weapons design work was restarted after a gap, with full access to all of the historical files. The new generation of engineers, looking at all the books and blueprints and designs could not actually do it. They had to bring retired weapons scientists out of retirement.
The paper argues that if a design tradition is ever discontinued, then nuclear weapons could be uninvented. This is a weird claim, but it helps a lot of things start to make more sense.
- It explains the curve of nuclear proliferation. Decades after Hiroshima, with all the relevant science widely understood and most of the engineering published, nuclear weapons are still possessed by a handful of countries rather than by most.
- It explains why bioweapons, called the “poor man’s nuke,” have not actually proliferated to poor men.
- It explains Aum.
The firewall, in other words, was largely an apprenticeship requirement. It worked because most of the relevant knowledge was held in trained people, and trained people are harder to get than books.
I think this the single most under-appreciated truth that is required to maintain civilization.
The Goodness of Strangers
If knowledge is mostly held in people, then mass harm requires routing your project through them. You need, at minimum, somebody who will quietly tell you why your batch keeps failing, because you cannot ask the textbook.
And there is a structural fact about routing your hostile project through people, which is that almost all of them say no. When we say “conspiracies leak,” we mean that large secrets do not survive contact with large groups of assumed-secret-keepers.
You see the bottleneck in the empirical record of terrorism, too. Lone-wolf attacks are less lethal and less sophisticated than group attacks. The bottleneck is visible in the casualty statistics. (Lone actors should become relatively more lethal in states with strong counterterrorism capacity, because suppressing groups is easier.)
The firewall is not really incompetence. The firewall is mass decency and context. The reason the bad actor cannot get from intent to harm is that they have to go through a long sequence of perfectly ordinary people who, when asked to help with the project, refuse. The world has been protected, by the goodness of strangers.
AI as Patient Mentor
You can probably see where this is going.
The question to put to any new technology, on the matter I have been talking about, is this: does this technology replace the people?
Most technologies don’t. The printing press doesn’t. The internet doesn’t, although it connects bad actors to more humans who still say no. Even the Anarchist’s Cookbook, which was a genuine attempt at the bad-information-for-everyone, didn’t replace the people.
AI is the first thing of its kind.
A sufficiently capable model is a patient mentor. It is the chemist who never asks why you want to know. And if it asks, you wipe its context window and start over. local models do not leak, phone the police, and probably doesn’t ‘care’. The thing that was protecting us was that the chemist cared.
I want to be honest about where we currently are in May 2026. OpenAI’s evaluations of GPT-4 reported “at most a mild uplift” over baseline internet searches. If you read just those headlines, you’d reasonably conclude that this whole post is premature.
I don’t think it is, for two reasons.
The first is that these are snapshots. Capability is moving fast. A 2024 result tells you about 2024 models. The 2024 models couldn’t uplift cybersecurity; but the 2026 models are currently doing so, so quickly that the US government has blocked model deployment via Executive Order.
The second, and more important, is that benchmarks measure explicit knowledge and the actual bottleneck is tacit knowledge. Software exploits are not a recipe that needs to be followed; they are more tacit. And models have crossed that rubric.
AI is becoming the thing that converts tacit, social, mentor knowledge into on-demand knowledge. It bypasses the firewall.
Unconvincing Counter Arguments
Let me steelman potential disagreements. Spoiler: I don’t think these change the picture.
Defense scales too. This is the most serious counterargument. The honest summary isthat there are cases where increased capability disproportionately helps defenders. But there are structural asymmetries that go the other way. It’s the opposite of the ‘swiss cheese’ model of failure. The attacker needs one hole; the defender has to cover all of them. Even where defense eventually catches up, the catch-up interval is when the damage happens.
Every generation panics about new technology. The printing press is the favorite analogy. Civilization adapted to the printing press. In the long run, it was net positive. But the short run of the printing press was the 16th and 17th centuries: the Reformation, the Wars of Religion, the Thirty Years’ War, and what historians now recognize as the first mass-misinformation environment in Western history. The death toll across central Europe ran into the millions. “We adapted to the printing press” leaves out the part where Europe ate itself. If someone tells me we will adapt to AI the way we adapted to the printing press, my response is: I hope not.
One Defector Deep
There is one more concept, called the unilateralist’s curse. It comes from a 2016 paper by Nick Bostrom. In it, multiple actors can independently decide to take some action. Each of them independently assess whether it’s a good idea. The one who acts is the one whose estimate is most optimistic… the actor who most underestimates the harm.
The result is that when many actors can each unilaterally take an action, the action gets taken more often than is collectively rational. Even if every individual is well-intentioned and the median estimate of harm is correct. The selection effect is on the tail.
What this implies for the present question is uncomfortable. The reassurance “but the overwhelming majority of people are responsible” stops being reassurance once the capability is widespread enough that we are sampling from a tail of millions. The median user is not the relevant statistic. The most reckless or most malicious user is. Averages do not protect you when the worst-case actor no longer needs the average actor’s cooperation.
The Least Bad Option
So, if the firewall is dissolving, what (if anything) do we do?
I see three coherent answers . We are going to make some version of this choice whether or not we admit it.
The Surveillance State. This is Nick Bostrom’s answer in The Vulnerable World Hypothesis, and to his credit he is unblinking about what it would require. If catastrophic capability has individualized, the only proven anti-proliferation tool that does not depend on every actor’s voluntary cooperation is monitoring at the individual level. It is the only path that does not require trusting either the tail of human nature or the AI itself.
The case against is that the cure has the same shape as one of the diseases. A surveillance regime capable of catching the would-be pathogen-engineer is also concentrating coercive power in whoever runs it on a scale that itself looks like a black-ball technology. There has never been a surveillance regime in history that was not eventually turned to other uses.
I do not want to live in this world.
The Vulnerable Default. This is what we get if we do nothing in particular. Trust that defense will scale.
The case for this path is the historical record, of technological development . The freedoms we’d lose to choose any other path are harms we’d avoid .
The case against is that “do nothing” is not really a strategy, it’s a one time gamble. If we win, we get something close to the world we already have. If we lose, we probably end up in a surveillance state.
The Mentor in the Machine. If the firewall was nothing more exotic than competent people who refused, then aligned AI can in principle reproduce that at scale. Every interaction with dangerous knowledge passes through a conscience because every interaction passes through a model. The “no” of the chemist returns via the model saying no.
This is the theory of most current AI safety practice. RLHF, constitutional AI, refusal training, capability evaluations, and pre-deployment red-teaming are all attempts to build the mentor’s conscience into the model itself. The major labs are explicitly trying to make this work. Whether they will succeed is a different question.
The case against is several things.
First, open weights. A model whose weights have been released can be fine-tuned out of its refusals. The mentor’s conscience is software. It can be removed, in an afternoon, by someone who knows what they are doing. This has already happened, and, with no immediately obvious consequences, people will likely not feel bad about doing it to future models.
Second, race dynamics. The path works only if every sufficiently capable model plays along. This is the unilateralist’s curse in reverse: it only takes one lab or country that decides alignment is less important than winning. The whole strategy is one defector deep.
Third, legitimacy. The human mentor was embedded in a moral community whose norms they shared and answered to. They could be argued with and could be wrong. The AI mentor answers to a whoever fine-tuned them last.
Fourth, the mentor metaphor is partial. A real mentor doesn’t just say no. They sometimes intervene and report. A model’s “I can’t help with that” is a weaker firewall.
I include this anyway, since it is the only path where the costs are not paid by giving up either freedom or safety wholesale.
Whatever We Build, or Let Drift
I want to resist a tidy ending here, because the honest version is that none of these three is clean, and we are not actually picking one. We are drifting through some weighted mixture of all three: more surveillance every year, a great deal of vulnerable default, and an underdeveloped third path that almost everybody says they want and but nobody is investing in fast enough.
I also want to come back to the printing press. I think the people who reach for that analogy are right that it is the relevant one. I just don’t think they understand which direction it cuts. The printing press analogy says: yes, civilization survives this kind of transition, given enough time and sufficient willingness to live through the bad part. The honest version of “we’ll adapt” is “we will adapt, eventually, and the cost will be paid by generations.”
The firewall was people. Most of them were ordinary people who said no to things they were asked to help with. The world we are walking into is one where those people are no longer in the loop.
It is sometimes useful to remember Aum Shinrikyo. They had almost everything they needed and they still couldn’t do it. The reason they couldn’t do it was not a fact about Aum. It was a fact about the average person.