The Strange Mathematics of Random Numbers: From Lottery Drums to Cryptographic Generators
Random numbers are simultaneously the most familiar and the most subtle objects in mathematics. The century of false starts, the philosophical puzzle of what randomness even means, the practical engineering of generators that fool sophisticated tests, and the cryptographic stakes when the engin
Most people think they know what a random number is until they are asked to define it. The intuition is "a number that could be anything with equal probability," which dissolves on contact with the question "the probability of what, measured how, generated by what?" Mathematicians, physicists, statisticians, and cryptographers have all spent serious effort trying to nail down what randomness is, and the answers they arrived at do not entirely agree with each other. The history is a story of repeated false starts, hard-won philosophical clarity, and the slow realization that the mathematical theory of randomness is harder than the engineering of practical generators that the theory was supposed to support.
The lottery-drum era
For most of human history, random numbers came from physical processes that nobody had a theory for but everybody trusted: drawing lots, rolling dice, dealing cards, spinning wheels. The Roman sortes Vergilianae divined the future by opening Virgil at random and reading the first line. The Han Dynasty I Ching used yarrow stalks. The Inca quipu sometimes encoded chance outcomes for civic decisions. The mechanism mattered less than the social agreement that the result was unpredictable to anyone with a stake in the outcome.
The first systematic mathematical treatment of randomness came from gambling. Cardano's Liber de Ludo Aleae in the 1560s analyzed dice probabilities; Pascal and Fermat's 1654 correspondence on the problem-of-points gave probability theory its first axioms; Bernoulli's 1713 Ars Conjectandi proved the law of large numbers. By the early 1700s, probability had a coherent mathematical structure. But probability is not the same thing as randomness — probability is a measure on outcomes, randomness is a property of sequences — and the conflation is responsible for much of the confusion that followed.
Tippett, Kendall, and the table era
The early 20th century needed random numbers for statistical sampling and Monte Carlo methods. Tippett's 1927 Random Sampling Numbers was a 41,600-digit table generated by sampling census-figure middle-digits. Kendall's 1939 work refined the methodology with mechanical generation. The 1955 RAND Corporation publication A Million Random Digits with 100,000 Normal Deviates generated digits by sampling thermal noise from a randomly-pulsing electronic vacuum tube, processed through a one-bit-per-pulse correlation-removing filter, and ran the result through statistical tests. The book sold for $25 in 1955 dollars and was the standard source for Monte Carlo work for two decades.
The table era exposed a strange property: the tables had to be tested, but the tests themselves could not establish randomness in any deep sense. They could establish that the tables passed certain statistical tests (uniform digit distribution, no obvious patterns at short lags, no biases in pair frequencies) but could not establish the absence of patterns the tests had not been designed to detect. Tests were necessary but never sufficient. The RAND book's introduction acknowledges this directly and recommends users run their own additional tests appropriate to their use case.
Pseudorandomness and the algorithmic turn
By the 1940s, computers needed random numbers in volumes far beyond what tables could provide. The response was pseudorandom generators — deterministic algorithms that produce sequences with statistical properties indistinguishable from true randomness for most practical purposes. Von Neumann's middle-square method (1949) was the first influential attempt and was promptly shown to have short cycles and bias. Lehmer's 1951 linear congruential generator x_{n+1} = (a*x_n + c) mod m with carefully chosen constants became the standard for decades and is still embedded in many runtime libraries.
The deep insight from pseudorandomness was that randomness is not a property of individual numbers — every individual number is just a number — but of the sequence and how it relates to the test that is examining it. A sequence is random with respect to a test if the test cannot distinguish it from a truly random sequence. Stronger tests demand more from the generator. Mersenne Twister (1997) passes most statistical tests but fails some advanced lattice-structure tests. PCG (2014) and xoshiro (2018) pass tests that Mersenne Twister fails. The arms race continues, with the realization that no fixed deterministic generator can pass all possible tests — there is always a test that knows the generator's algorithm and exploits it.
The Kolmogorov-Chaitin definition
The deepest mathematical attempt to pin down randomness came from Kolmogorov in the 1960s and Chaitin independently in the 1970s. The definition: a finite string is random if it cannot be produced by any computer program shorter than itself. The complexity of a string is the length of the shortest program that produces it; a string is random if its complexity is approximately equal to its length, meaning no program can describe it more compactly than just listing it.
The definition is mathematically beautiful and operationally hopeless. The function that maps a string to its Kolmogorov complexity is uncomputable — there is no algorithm that can determine, given a string, what the shortest program producing it is. The definition tells us what randomness is but not how to verify it for any specific case. It also tells us a striking fact: almost all infinite strings are random in the Kolmogorov sense, but we cannot exhibit any specific infinite string and prove it is random. Randomness is everywhere and yet uncatchable.
The Kolmogorov definition aligns with the intuition that randomness is incompressibility. A run of a billion zeros has very low Kolmogorov complexity (the program is "print zero one billion times"). A run of a billion bits from a quantum noise source has Kolmogorov complexity approximately equal to one billion (no shorter description exists). The randomness lives in the inability of any algorithm to predict the next bit from the previous bits.
The cryptographic stakes
Most randomness applications can tolerate small biases or short-range correlations without catastrophic consequence. Cryptography cannot. Cryptographic protocols rely on attackers being unable to predict random values that are used as keys, nonces, or session tokens. If the random values are predictable, the entire security collapses, often in ways that are subtle and remain undetected for years.
The history of cryptographic randomness failures is the history of catastrophic breaches. Netscape's 1995 SSL implementation seeded its random generator with the current time and process id, both of which an attacker could guess; researchers broke session keys in seconds. Debian's 2006-2008 OpenSSL package had a code change that effectively reduced the generator's seed entropy to 15 bits; thousands of SSH keys generated during that period were predictable, and the affected systems remained vulnerable for years until they replaced their keys. The Sony PS3 ECDSA signing flaw (2010) used the same nonce for every signature, allowing the private key to be computed from a few public signatures.
The modern response is cryptographically secure pseudorandom number generators (CSPRNGs) seeded from operating-system entropy pools that mix hardware noise sources (timer jitter, network packet timing, disk activity, CPU thermal noise) into a high-entropy seed, then expand it through a cryptographic primitive (HMAC, CTR-DRBG, the new ChaCha20-based generators in modern kernels). The generators are designed so that even an attacker who knows the algorithm and can observe outputs cannot predict future outputs without knowing the seed, and cannot recover past outputs from compromised current state.
True randomness from physics
For applications that demand more than algorithmic pseudorandomness, hardware random number generators (HRNGs) extract entropy from physical processes whose outcomes are believed to be fundamentally random. Thermal noise in resistors, shot noise in diodes, jitter in oscillator circuits, radioactive decay, atmospheric noise, and quantum measurements all serve as sources. Modern Intel and AMD CPUs include on-chip HRNGs (Intel's RDRAND/RDSEED instructions); modern ARM chips include hardware-RNG peripherals; commercial entropy beacons like NIST's Randomness Beacon publish hashed quantum-source outputs every minute as a public time-stamped randomness service.
The deep philosophical question is whether quantum mechanics is genuinely random or merely unpredictable to observers. The orthodox Copenhagen interpretation holds that quantum measurements have intrinsically random outcomes; hidden-variable interpretations hold that the apparent randomness reflects unmeasured deterministic dynamics. Bell's theorem and the experimental violations of Bell inequalities by Aspect (1981), Hensen (2015) and others have ruled out local hidden-variable theories. Within the constraints of locality and realism, quantum measurements are genuinely random. The practical upshot is that quantum-source HRNGs are the closest thing to "true" randomness we can engineer, and the foundational randomness is unconditional — no observer can predict the next bit, in principle.
The deepest puzzle
What remains philosophically interesting is that randomness is one of the few mathematical concepts where the limit of practical engineering meets the limit of formal definition. Pseudorandom generators are mathematically deterministic but practically unpredictable to bounded computational adversaries. Hardware generators are physically random under the assumptions of quantum mechanics but operationally limited by extraction efficiency and conditioning. The Kolmogorov definition captures what randomness is but offers no way to verify any specific case. The statistical tests that practitioners actually use are arbitrary in choice and conservative in scope.
The pragmatic resolution is that randomness in practice is purpose-relative. The right generator for a roleplaying game is different from the right generator for a casino is different from the right generator for a cryptographic protocol is different from the right generator for a Monte Carlo simulation. Each context has different threat models, different statistical demands, and different cost-benefit tradeoffs. The mathematician can give us coherent definitions; the engineer has to pick which definition matters for which use case. The convergence of formal mathematics, computational complexity, and quantum physics into a working theory of randomness is one of the underappreciated triumphs of 20th-century mathematics, and the convergence is still ongoing.