Numerics/Random Numbers
From charlesreid1
Random numbers are one of those things that sound dead simple — "just give me a number I can't predict, right?" — but then the rabbit hole opens up and suddenly you're comparing Mersenne Twister state spaces and wondering if your Monte Carlo simulation is actually converged.
This page is the Python-centric tour through random number generation: what actually works, what's secretly broken, and how to wield these tools without accidentally turning your simulation into a deterministic paperweight. If you stumbled here looking for C++ rand() quirks, check out the sister page Random Numbers — but honestly, for numerical work, you want to be here.
The Two Kinds of Random
There are really only two flavors of random numbers in the universe, and knowing which one you're dealing with changes everything:
| Type | Where it comes from | Use it for... |
|---|---|---|
| True random | Physical entropy — radioactive decay, thermal noise, keyboard mash-timing, hardware RNGs, /dev/random |
Cryptography, key generation, anything where predictability = disaster |
| Pseudorandom (PRNG) | A deterministic algorithm starting from a seed | Simulations, Monte Carlo, randomized algorithms, games, reproducible science |
True random is beautiful but slow and you can't replay it. Pseudorandom is fast, reproducible, and — if the algorithm is good — statistically indistinguishable from the real thing for most purposes. The rest of this page is about PRNGs, because that's what 99% of numerical work needs.
Pseudorandom Number Generators: Controlled Chaos
The basic idea behind a PRNG is almost disappointingly simple: you take a number (the state), scramble it through some mathematical function, output part of the result as your "random" number, and keep the rest as the new state. Lather, rinse, repeat.
The OG: Linear Congruential Generators
$ X_{n+1} = (aX_n + c) \bmod m $
This is the LCG — the Honda Civic of random number generators. It's everywhere (including C's rand()), it's fast, and it's... fine, for very loose definitions of "fine." LCGs have lousy statistical properties in low dimensions — if you plot consecutive pairs $ (X_n, X_{n+1}) $ as points in 2D, they fall on a small number of parallel planes. This is the Marsaglia effect and it's honestly kind of horrifying once you see it.
The Workhorse: Mersenne Twister (MT19937)
Python's random module uses the Mersenne Twister under the hood — specifically MT19937, with a period of $ 2^{19937} - 1 $ (yes, that's a Mersenne prime, hence the name). The period is so absurdly long that you will never, ever exhaust it. The state is 624 32-bit integers, or about 2.5 KB. It passes most statistical test suites with flying colors.
It is not cryptographically secure. If an attacker can observe 624 consecutive outputs, they can reconstruct the entire internal state and predict every future number. For science and simulation this is irrelevant — for crypto it's a dealbreaker. Keep reading.
Python's random Module: Batteries Included
The random module is your everyday driver for random numbers in Python. It wraps MT19937 and gives you a clean, Pythonic API.
The Core Functions You'll Actually Use
| Function | What it gives you |
|---|---|
random.random() |
Float in [0.0, 1.0). This is the one you'll call 90% of the time. |
random.randint(a, b) |
Random integer in [a, b] — inclusive on both ends. Yes, randint(0, 10) can give you 10.
|
random.randrange(start, stop, step) |
Like range() but returns one random element. randrange(0, 10) gives 0–9.
|
random.choice(seq) |
Grab a random element from a sequence. O(1) for lists. |
random.shuffle(seq) |
Fisher-Yates in-place shuffle. Honestly kind of elegant. |
random.sample(population, k) |
k unique random elements without replacement. Reservoir sampling when needed.
|
random.gauss(mu, sigma) |
Gaussian (normal) distribution. Slightly faster than normalvariate but uses a different algorithm.
|
random.uniform(a, b) |
Float in [a, b]. Also random.triangular(), random.betavariate(), random.expovariate(), etc.
|
Seeding: The Ritual
import random
random.seed(42)
print(random.random()) # Always 0.6394267984578837 on CPython 3.x
Call seed() once at the start of your program if you want reproducible runs. If you don't seed, Python seeds from os.urandom() on modern versions, which is fine — but your results won't be reproducible between runs.
One seed to rule them all: if you import other modules that also use random, seeding the global instance affects them too. This is either a feature or a deeply confusing bug depending on your perspective.
NumPy's Random: When the Stakes Get Higher
The random module is fine for one-at-a-time random numbers, but numerical work often needs millions of them, fast. NumPy's random subsystem does exactly that — vectorized, tight C loops, and way more distributions than the stdlib.
The API Schism: Old Way vs. New Way
NumPy has been through a random-number glow-up. Here's the deal:
| Approach | API | Verdict |
|---|---|---|
| Old school | numpy.random.rand(), numpy.random.randn(), numpy.random.randint() |
Still works, still everywhere in legacy code. Uses a global RandomState singleton. Fine for quick scripts, annoying for reproducibility.
|
| The new hotness | numpy.random.default_rng() → Generator object |
Way better. Uses PCG64 by default (better statistical properties than MT19937). Explicit state, explicit seeding. No global state leakage. |
The new way is the recommended path and it's honestly just cleaner:
import numpy as np
rng = np.random.default_rng(seed=42)
x = rng.random(1_000_000) # A million uniform floats, just like that
y = rng.normal(0, 1, size=1000) # A thousand standard normals
z = rng.integers(0, 100, size=500) # 500 ints in [0, 100)
The Generator API gives you basically every distribution known to statistics — uniform, normal, exponential, gamma, beta, binomial, Poisson, chi-square, F, t, Laplace, logistic, lognormal, multinomial, multivariate normal, Dirichlet, Wishart... the list goes on. It's a statistical candy store.
Why PCG64 Over MT19937?
PCG64 is a permuted congruential generator. It has better statistical properties than MT19937, a smaller state (128 bits — just two integers), and supports multiple independent streams via SeedSequence. It's also faster. The only real win MT19937 has is that the period is absurdly large, but PCG64's period of $ 2^{128} $ is still "you'll never exhaust it in a billion lifetimes" territory.
A Tour Through the Distributions
Honestly, half the power of random numbers is in the transformations. You start with uniform [0,1) and then shape it into whatever distribution your model needs. Here are the greatest hits:
The Inverse Transform Method
This is one of those ideas that's so elegant it hurts. If you have a CDF $ F(x) $ with inverse $ F^{-1} $, and you generate $ U \sim \text{Uniform}(0,1) $, then $ X = F^{-1}(U) $ follows the distribution with CDF $ F $.
That's it. That's the whole trick. It's why the exponential distribution is $ -\lambda \ln(U) $ — the CDF inversion just works out that way. Some distributions (normal) don't have a closed-form inverse CDF, so fancier methods like Box-Muller or Ziggurat step in. But the inverse transform is the conceptual backbone.
NumPy's Distribution Buffet
rng = np.random.default_rng()
rng.uniform(0, 1, size=1000) # The building block
rng.normal(mu, sigma, size=1000) # The bell curve
rng.exponential(scale, size=1000) # Waiting times
rng.poisson(lam, size=1000) # Count data
rng.binomial(n, p, size=1000) # k successes in n trials
rng.gamma(shape, scale, size=1000) # Waiting time for k events
rng.beta(a, b, size=1000) # Proportions, Bayesian priors
rng.chisquare(df, size=1000) # Sum of squared normals
rng.choice([1,2,3,4,5], size=100) # Discrete sampling with/without replacement
rng.permutation(arr) # Shuffle in place
Seeds and Reproducibility: Control Your Chaos
Reproducibility is the quiet superpower of PRNGs. Set the same seed, get the same sequence — every single time, on any machine, any OS. This makes debugging stochastic code actually possible and lets other people replicate your results exactly.
The Golden Rules
- Seed once, at the top. Never inside a loop. You already knew this, but everyone does it at least once.
- Use
SeedSequencefor parallel work. If you're running 8 MPI processes or 32joblibworkers, you don't want them all using the same stream.SeedSequencederives independent seeds from one parent seed — it's basically entropy-aware seed spawning and it's kind of genius. - Record your seed. Stick it in a config file, a log line, or a comment. Future you will thank present you.
from numpy.random import SeedSequence, default_rng
ss = SeedSequence(12345)
child_seeds = ss.spawn(4) # 4 independent streams
rngs = [default_rng(s) for s in child_seeds]
Don't Seed in a Loop (Seriously)
# DON'T DO THIS
for i in range(1000):
random.seed(42)
x = random.random() # x is the SAME every iteration
You'd be surprised how often this shows up in actual code. The first random number after seeding with a fixed value is a deterministic function of that seed. Seeding inside a loop with a constant seed just gives you the same "random" number over and over. Use a single seed at the top and let the generator do its thing.
When Random Isn't Random Enough: Cryptography
Sometimes "statistically random" isn't good enough — you need "your adversary, armed with a supercomputer and a copy of your algorithm, cannot predict the next bit." That's the crypto-grade bar.
secrets and os.urandom()
Python 3.6+ ships the secrets module, which wraps os.urandom() (the OS's cryptographically secure RNG — on Linux this comes from /dev/urandom and the kernel's entropy pool):
import secrets
token = secrets.token_hex(32) # 64-char hex string
key = secrets.randbits(256) # 256 random bits
url_safe = secrets.token_urlsafe() # For session IDs, CSRF tokens, etc.
Why MT19937 Is NOT Crypto-Safe
The Mersenne Twister is a linear recurrence. Observe 624 consecutive 32-bit outputs and you can solve for the entire internal state with linear algebra. After that you can predict every future output and reconstruct every past output. The random module even warns about this in its docs — it's not suitable for security purposes. Use secrets or os.urandom() directly.
Practical Gotchas: Stuff That Will Bite You
The randint Inclusive Surprise
random.randint(a, b) includes b. numpy.random.randint(a, b) in the new API (rng.integers) does NOT include b. And to make it extra confusing, numpy.random.randint in the OLD API follows the Python convention and DOES include b for some argument patterns. Read the docs for whatever API you're using. This has caused real bugs in real papers.
Slow Generation Patterns
Generating numbers one at a time in a Python loop is death by a thousand function calls. If you need a million random numbers, ask NumPy for them all at once:
# Slow (Python-loop overhead dominates):
vals = [random.random() for _ in range(1_000_000)]
# Fast (vectorized, C-level):
vals = rng.random(1_000_000)
Parallel RNG: Independent Streams or Bust
If you spawn 8 parallel processes and each seeds with the current time, you might get collisions. Worse, if you seed them all with the same seed (or with seeds 0,1,2,...7), the streams might be correlated. SeedSequence + PCG64 makes this basically foolproof — use it.
The random.random() * N Subtle Bias
Multiplying a uniform float by N and flooring to get an integer index can introduce tiny biases due to floating-point representation. For casual use it's fine, but if you're doing something like lottery draw simulations where every bit of uniformity counts, use randint or randrange — they use rejection sampling internally to guarantee uniformity.
See Also
- Random Numbers — The C++ page exploring
rand()weirdness (the one that kicked all this off) - Numerics/Statistical Descriptions of Data — Making sense of the numbers once you've generated them