Why Your P(doom) Number Needs a Horizon, a Threshold, and a Conditioning Event
Most stated p(doom) numbers are not probabilities. They are slogans. A number that does not specify what it is a probability of, over what time horizon, conditional on what, is not a number a risk committee can act on. It is a number a person can say at a dinner party. The distinction matters because those numbers are now showing up in board decks, insurance filings, and regulatory submissions, where slogans get priced as if they were probabilities.
That is a calibration failure. It is increasingly a governance one.
The reframe
Every serious p(doom) estimate is a conditional probability with three unstated variables. The time horizon. The threshold for "doom." The conditioning event. A number without those three specified is not wrong. It is incoherent. Most public estimates, including the ones quoted most often in executive contexts, do not specify any of the three.
When Dario Amodei says 25 percent, he is stating a probability without a specified horizon, without a specified threshold, and without a specified conditioning event. When Yann LeCun says less than 0.01 percent, he is doing the same thing. The two numbers cannot be compared because they are not measurements of the same underlying quantity. The public debate treats them as if they were. The result is that executive audiences absorb a spread of estimates that differ by four orders of magnitude and conclude that the experts disagree by four orders of magnitude. The experts may agree more than they appear to. The numbers are not telling you what you think they are telling you.
Observed versus inferred
What is documented. The 2023 survey of AI researchers specified a horizon (100 years) and a threshold (human extinction or similarly severe and permanent disempowerment) and produced a mean of 14.4 percent and a median of 5 percent. That is a defensible number because the conditions are stated. Almost every public estimate from a frontier lab CEO since has omitted those specifications.
What follows. The gap between Amodei's 25 percent and LeCun's 0.01 percent is not primarily a gap in their views about AI risk. It is partly a gap in what they are measuring. Amodei's number is closer to an unconditional estimate over an unspecified horizon including all paths to catastrophic outcome. LeCun's number is closer to a conditional estimate, given current architectures, over a near-term horizon on a specific failure mode. Neither has stated this. Both are cited as if they were the same number.
What does not follow. That the disagreement is illusory, that all public estimates secretly agree, or that the specification problem resolves when you demand it. The specification problem reveals real disagreement and eliminates fake disagreement. The residual is smaller than the public debate suggests and still non-zero.
The four specification failures
Most unspecified p(doom) numbers fail in one of four predictable ways. The reader who learns to pattern-match the four can triage any public p(doom) estimate in seconds.
Horizon collapse. The number is stated without a timeframe. "25 percent" is the canonical example. Over the next year, decade, century, or ever? A 25 percent probability over 100 years is a rational basis for long-horizon governance investment. A 25 percent probability over 10 years is a rational basis for halting deployment. The two require different actions. The stated number requires neither because it has specified neither.
Threshold drift. The number mixes extinction with severe disruption. Some estimators mean "humanity ends." Others mean "catastrophic civilizational setback from which recovery takes centuries." Others mean "severe economic or political disruption." These are different events with different probabilities. A single number that averages across them is a weighted mean of quantities the estimator has not weighted.
Conditioning silence. The number does not specify whether it assumes AGI, transformative AI, current systems, or unspecified future capability. A p(doom) conditional on AGI development is a different number than an unconditional p(doom) that includes the probability of AGI not arriving. Most public estimates conflate the two. The collapse obscures the underlying disagreement, which is usually about whether AGI arrives, not about what happens if it does.
Reference class confusion. The number mixes the estimator's personal belief with their read of research consensus. When a CEO says 25 percent, is that their view, or their summary of the field's view, or their median of the five researchers they trust most? Most estimators do not distinguish. The listener cannot tell whether they are hearing one opinion or a weighted average of many.
The two-question P(doom) test
The full specification framework is necessary for governance documents. For the reader who just needs to triage an incoming p(doom) number in real time, the entire apparatus compresses to two questions.
Over what horizon? If the source cannot answer, the number is not actionable. Note it, move on, do not price it.
Conditional on what? If the source cannot answer, the number is a mood statement, not a probability. It tells you what the speaker feels. It does not tell you what the speaker believes about any specific event.
A number that survives both questions is a probability. A number that fails either is a slogan. The distinction is usually decidable in the thirty seconds it takes to ask the questions. The executive who runs every incoming p(doom) estimate through the two-question filter will triage the discourse more cleanly than ninety percent of their peers, at a cost of no additional analytical infrastructure.
The three numbers you actually need
For the reader's own position, not the triage of others', the specification discipline produces a triple. Three separate estimates, each with explicit horizon, threshold, and conditioning event.
The unconditional 10-year number. Probability of catastrophic AI outcomes within the next decade, from today's conditions, no assumed triggering event. Captures near-term tail risk. For most readers, this number is low. For most frontier lab CEOs, it is also low. The 10-year number is where public estimates converge most.
The unconditional 100-year number. Probability of catastrophic AI outcomes within the next century, no assumed triggering event. Captures long-horizon integration of risk across successive AI generations, geopolitical shifts, and governance failures. The gap between this number and the 10-year number is the reader's implicit view on whether AI risk is front-loaded or back-loaded. Most readers have never made that view explicit.
The conditional-on-AGI number. Probability of catastrophic outcomes conditional on the development of artificial general intelligence. Almost always the highest of the three. Captures the reader's view on the technical alignment problem and the governance coordination problem, holding timeline uncertainty constant. The separation from timeline doubt is where many readers discover their number is higher than they thought.
The spread between the three is where the reader's actual beliefs live. A reader whose numbers are 3, 8, and 40 holds a different position than a reader whose numbers are 3, 8, and 10. Both might say "8 percent" when asked. The single number hides the structure.
The stake
A p(doom) number without horizon, threshold, and conditioning event is a slogan wearing the costume of a probability. The executive environment is absorbing those slogans at increasing volume and pricing them as if they were probabilities. The only question that matters is whether the numbers you cite, and the numbers you accept from others, are specified well enough to be defensible when someone asks what they actually measure.