What is the main argument of this article?

The article argues that most public p(doom) numbers are not actionable probabilities because they do not specify what event they measure, over what horizon, or under what conditions. Without those specifications, Christopher argues, they function more like slogans than numbers a risk committee can use.

Why does Christopher say a p(doom) number needs a horizon, a threshold, and a conditioning event?

He argues that every serious p(doom) estimate is a conditional probability with those three variables built in, whether they are stated or not. If they are omitted, the number may sound precise but it is not coherent enough to compare, govern, or price.

What are the main specification failures the article identifies in public p(doom) estimates?

The piece identifies four recurring failures: horizon collapse, threshold drift, conditioning silence, and reference class confusion. Christopher presents these as the main reasons public estimates are often treated as comparable even when they are not measuring the same thing.

What is the two-question p(doom) test described in the article?

The article reduces the triage problem to two questions: over what horizon, and conditional on what. If the source cannot answer those two questions, Christopher argues the number should not be treated as an actionable probability.

What are the three numbers Christopher says readers actually need for their own position?

The piece recommends keeping three separate estimates: an unconditional 10-year number, an unconditional 100-year number, and a conditional-on-AGI number. Christopher argues that the spread between those three values reveals the structure of a person's actual beliefs better than any single p(doom) number can.

What should executive teams and risk committees take away from this article right now?

The central takeaway is that any p(doom) number entering a board deck, insurance filing, or governance document should be forced through a specification test before it is cited or priced. Christopher's argument is that clarity about what a number actually measures is now part of serious AI governance, not a philosophical side issue.

Why Your P(doom) Number Needs a Horizon, a Threshold, and a Conditioning Event

Apr 28

Written By Prof. Christopher Sanchez

Most stated p(doom) numbers are not probabilities. They are slogans. A number that does not specify what it is a probability of, over what time horizon, conditional on what, is not a number a risk committee can act on. It is a number a person can say at a dinner party. The distinction matters because those numbers are now showing up in board decks, insurance filings, and regulatory submissions, where slogans get priced as if they were probabilities.

That is a calibration failure. It is increasingly a governance one.

The reframe

Every serious p(doom) estimate is a conditional probability with three unstated variables. The time horizon. The threshold for "doom." The conditioning event. A number without those three specified is not wrong. It is incoherent. Most public estimates, including the ones quoted most often in executive contexts, do not specify any of the three.

When Dario Amodei says 25 percent, he is stating a probability without a specified horizon, without a specified threshold, and without a specified conditioning event. When Yann LeCun says less than 0.01 percent, he is doing the same thing. The two numbers cannot be compared because they are not measurements of the same underlying quantity. The public debate treats them as if they were. The result is that executive audiences absorb a spread of estimates that differ by four orders of magnitude and conclude that the experts disagree by four orders of magnitude. The experts may agree more than they appear to. The numbers are not telling you what you think they are telling you.

Observed versus inferred

What is documented. The 2023 survey of AI researchers specified a horizon (100 years) and a threshold (human extinction or similarly severe and permanent disempowerment) and produced a mean of 14.4 percent and a median of 5 percent. That is a defensible number because the conditions are stated. Almost every public estimate from a frontier lab CEO since has omitted those specifications.

What follows. The gap between Amodei's 25 percent and LeCun's 0.01 percent is not primarily a gap in their views about AI risk. It is partly a gap in what they are measuring. Amodei's number is closer to an unconditional estimate over an unspecified horizon including all paths to catastrophic outcome. LeCun's number is closer to a conditional estimate, given current architectures, over a near-term horizon on a specific failure mode. Neither has stated this. Both are cited as if they were the same number.

What does not follow. That the disagreement is illusory, that all public estimates secretly agree, or that the specification problem resolves when you demand it. The specification problem reveals real disagreement and eliminates fake disagreement. The residual is smaller than the public debate suggests and still non-zero.

The four specification failures

Most unspecified p(doom) numbers fail in one of four predictable ways. The reader who learns to pattern-match the four can triage any public p(doom) estimate in seconds.

Horizon collapse. The number is stated without a timeframe. "25 percent" is the canonical example. Over the next year, decade, century, or ever? A 25 percent probability over 100 years is a rational basis for long-horizon governance investment. A 25 percent probability over 10 years is a rational basis for halting deployment. The two require different actions. The stated number requires neither because it has specified neither.

Threshold drift. The number mixes extinction with severe disruption. Some estimators mean "humanity ends." Others mean "catastrophic civilizational setback from which recovery takes centuries." Others mean "severe economic or political disruption." These are different events with different probabilities. A single number that averages across them is a weighted mean of quantities the estimator has not weighted.

Conditioning silence. The number does not specify whether it assumes AGI, transformative AI, current systems, or unspecified future capability. A p(doom) conditional on AGI development is a different number than an unconditional p(doom) that includes the probability of AGI not arriving. Most public estimates conflate the two. The collapse obscures the underlying disagreement, which is usually about whether AGI arrives, not about what happens if it does.

Reference class confusion. The number mixes the estimator's personal belief with their read of research consensus. When a CEO says 25 percent, is that their view, or their summary of the field's view, or their median of the five researchers they trust most? Most estimators do not distinguish. The listener cannot tell whether they are hearing one opinion or a weighted average of many.

The two-question P(doom) test

The full specification framework is necessary for governance documents. For the reader who just needs to triage an incoming p(doom) number in real time, the entire apparatus compresses to two questions.

Over what horizon? If the source cannot answer, the number is not actionable. Note it, move on, do not price it.

Conditional on what? If the source cannot answer, the number is a mood statement, not a probability. It tells you what the speaker feels. It does not tell you what the speaker believes about any specific event.

A number that survives both questions is a probability. A number that fails either is a slogan. The distinction is usually decidable in the thirty seconds it takes to ask the questions. The executive who runs every incoming p(doom) estimate through the two-question filter will triage the discourse more cleanly than ninety percent of their peers, at a cost of no additional analytical infrastructure.

The three numbers you actually need

For the reader's own position, not the triage of others', the specification discipline produces a triple. Three separate estimates, each with explicit horizon, threshold, and conditioning event.

The unconditional 10-year number. Probability of catastrophic AI outcomes within the next decade, from today's conditions, no assumed triggering event. Captures near-term tail risk. For most readers, this number is low. For most frontier lab CEOs, it is also low. The 10-year number is where public estimates converge most.

The unconditional 100-year number. Probability of catastrophic AI outcomes within the next century, no assumed triggering event. Captures long-horizon integration of risk across successive AI generations, geopolitical shifts, and governance failures. The gap between this number and the 10-year number is the reader's implicit view on whether AI risk is front-loaded or back-loaded. Most readers have never made that view explicit.

The conditional-on-AGI number. Probability of catastrophic outcomes conditional on the development of artificial general intelligence. Almost always the highest of the three. Captures the reader's view on the technical alignment problem and the governance coordination problem, holding timeline uncertainty constant. The separation from timeline doubt is where many readers discover their number is higher than they thought.

The spread between the three is where the reader's actual beliefs live. A reader whose numbers are 3, 8, and 40 holds a different position than a reader whose numbers are 3, 8, and 10. Both might say "8 percent" when asked. The single number hides the structure.

The stake

A p(doom) number without horizon, threshold, and conditioning event is a slogan wearing the costume of a probability. The executive environment is absorbing those slogans at increasing volume and pricing them as if they were probabilities. The only question that matters is whether the numbers you cite, and the numbers you accept from others, are specified well enough to be defensible when someone asks what they actually measure.

Prof. Christopher Sanchez

Christopher Sanchez is an operator and strategic advisor working at the intersection of AI, geopolitics, and business strategy. He is Founder and CEO of Emergent Line, where he advises leadership teams on how to turn AI into durable advantage in a changing global environment. He writes dC/dt as a lens on how quickly the strategic environment is shifting, and what that means for the decisions leaders have to make now.