Numerics/Random Numbers

Random numbers are one of those things that sound dead simple — "just give me a number I can't predict, right?" — but then the rabbit hole opens up and suddenly you're comparing Mersenne Twister state spaces and wondering if your Monte Carlo simulation is actually converged.

This page is the Python-centric tour through random number generation: what actually works, what's secretly broken, and how to wield these tools without accidentally turning your simulation into a deterministic paperweight. If you stumbled here looking for C++ rand() quirks, check out the sister page Random Numbers — but honestly, for numerical work, you want to be here.

The Two Kinds of Random

There are really only two flavors of random numbers in the universe, and knowing which one you're dealing with changes everything:

Type	Where it comes from	Use it for...
True random	Physical entropy — radioactive decay, thermal noise, keyboard mash-timing, hardware RNGs, `/dev/random`	Cryptography, key generation, anything where predictability = disaster
Pseudorandom (PRNG)	A deterministic algorithm starting from a seed	Simulations, Monte Carlo, randomized algorithms, games, reproducible science

True random is beautiful but slow and you can't replay it. Pseudorandom is fast, reproducible, and — if the algorithm is good — statistically indistinguishable from the real thing for most purposes. The rest of this page is about PRNGs, because that's what 99% of numerical work needs.

Pseudorandom Number Generators: Controlled Chaos

The basic idea behind a PRNG is almost disappointingly simple: you take a number (the state), scramble it through some mathematical function, output part of the result as your "random" number, and keep the rest as the new state. Lather, rinse, repeat.

The OG: Linear Congruential Generators

$X_{n+1} = (aX_n + c) \bmod m$

This is the LCG — the Honda Civic of random number generators. It's everywhere (including C's rand()), it's fast, and it's... fine, for very loose definitions of "fine." LCGs have lousy statistical properties in low dimensions — if you plot consecutive pairs $(X_n, X_{n+1})$ as points in 2D, they fall on a small number of parallel planes. This is the Marsaglia effect and it's honestly kind of horrifying once you see it.

The Workhorse: Mersenne Twister (MT19937)

Python's random module uses the Mersenne Twister under the hood — specifically MT19937, with a period of $2^{19937} - 1$ (yes, that's a Mersenne prime, hence the name). The period is so absurdly long that you will never, ever exhaust it. The state is 624 32-bit integers, or about 2.5 KB. It passes most statistical test suites with flying colors.

It is not cryptographically secure. If an attacker can observe 624 consecutive outputs, they can reconstruct the entire internal state and predict every future number. For science and simulation this is irrelevant — for crypto it's a dealbreaker. Keep reading.

Python's `random` Module: Batteries Included

The random module is your everyday driver for random numbers in Python. It wraps MT19937 and gives you a clean, Pythonic API.

The Core Functions You'll Actually Use

Function	What it gives you
`random.random()`	Float in [0.0, 1.0). This is the one you'll call 90% of the time.
`random.randint(a, b)`	Random integer in [a, b] — inclusive on both ends. Yes, `randint(0, 10)` can give you 10.
`random.randrange(start, stop, step)`	Like `range()` but returns one random element. `randrange(0, 10)` gives 0–9.
`random.choice(seq)`	Grab a random element from a sequence. O(1) for lists.
`random.shuffle(seq)`	Fisher-Yates in-place shuffle. Honestly kind of elegant.
`random.sample(population, k)`	`k` unique random elements without replacement. Reservoir sampling when needed.
`random.gauss(mu, sigma)`	Gaussian (normal) distribution. Slightly faster than `normalvariate` but uses a different algorithm.
`random.uniform(a, b)`	Float in [a, b]. Also `random.triangular()`, `random.betavariate()`, `random.expovariate()`, etc.

Seeding: The Ritual

import random
random.seed(42)
print(random.random())  # Always 0.6394267984578837 on CPython 3.x

Call seed() once at the start of your program if you want reproducible runs. If you don't seed, Python seeds from os.urandom() on modern versions, which is fine — but your results won't be reproducible between runs.

One seed to rule them all: if you import other modules that also use random, seeding the global instance affects them too. This is either a feature or a deeply confusing bug depending on your perspective.

NumPy's Random: When the Stakes Get Higher

The random module is fine for one-at-a-time random numbers, but numerical work often needs millions of them, fast. NumPy's random subsystem does exactly that — vectorized, tight C loops, and way more distributions than the stdlib.

The API Schism: Old Way vs. New Way

NumPy has been through a random-number glow-up. Here's the deal:

Approach	API	Verdict
Old school	`numpy.random.rand()`, `numpy.random.randn()`, `numpy.random.randint()`	Still works, still everywhere in legacy code. Uses a global `RandomState` singleton. Fine for quick scripts, annoying for reproducibility.
The new hotness	`numpy.random.default_rng()` → `Generator` object	Way better. Uses PCG64 by default (better statistical properties than MT19937). Explicit state, explicit seeding. No global state leakage.

The new way is the recommended path and it's honestly just cleaner:

import numpy as np
rng = np.random.default_rng(seed=42)
x = rng.random(1_000_000)        # A million uniform floats, just like that
y = rng.normal(0, 1, size=1000)  # A thousand standard normals
z = rng.integers(0, 100, size=500)  # 500 ints in [0, 100)

The Generator API gives you basically every distribution known to statistics — uniform, normal, exponential, gamma, beta, binomial, Poisson, chi-square, F, t, Laplace, logistic, lognormal, multinomial, multivariate normal, Dirichlet, Wishart... the list goes on. It's a statistical candy store.

Why PCG64 Over MT19937?

PCG64 is a permuted congruential generator. It has better statistical properties than MT19937, a smaller state (128 bits — just two integers), and supports multiple independent streams via SeedSequence. It's also faster. The only real win MT19937 has is that the period is absurdly large, but PCG64's period of $2^{128}$ is still "you'll never exhaust it in a billion lifetimes" territory.

A Tour Through the Distributions

Honestly, half the power of random numbers is in the transformations. You start with uniform [0,1) and then shape it into whatever distribution your model needs. Here are the greatest hits:

The Inverse Transform Method

This is one of those ideas that's so elegant it hurts. If you have a CDF $$ F(x) $$ with inverse $F^{-1}$ , and you generate $U \sim \text{Uniform}(0,1)$ , then $X = F^{-1}(U)$ follows the distribution with CDF $$ F $$ .

That's it. That's the whole trick. It's why the exponential distribution is $-\lambda \ln(U)$ — the CDF inversion just works out that way. Some distributions (normal) don't have a closed-form inverse CDF, so fancier methods like Box-Muller or Ziggurat step in. But the inverse transform is the conceptual backbone.

NumPy's Distribution Buffet

rng = np.random.default_rng()

rng.uniform(0, 1, size=1000)       # The building block
rng.normal(mu, sigma, size=1000)    # The bell curve
rng.exponential(scale, size=1000)   # Waiting times
rng.poisson(lam, size=1000)         # Count data
rng.binomial(n, p, size=1000)       # k successes in n trials
rng.gamma(shape, scale, size=1000)  # Waiting time for k events
rng.beta(a, b, size=1000)           # Proportions, Bayesian priors
rng.chisquare(df, size=1000)        # Sum of squared normals
rng.choice([1,2,3,4,5], size=100)   # Discrete sampling with/without replacement
rng.permutation(arr)                # Shuffle in place

Seeds and Reproducibility: Control Your Chaos

Reproducibility is the quiet superpower of PRNGs. Set the same seed, get the same sequence — every single time, on any machine, any OS. This makes debugging stochastic code actually possible and lets other people replicate your results exactly.

The Golden Rules

Seed once, at the top. Never inside a loop. You already knew this, but everyone does it at least once.
Use SeedSequence for parallel work. If you're running 8 MPI processes or 32 joblib workers, you don't want them all using the same stream. SeedSequence derives independent seeds from one parent seed — it's basically entropy-aware seed spawning and it's kind of genius.
Record your seed. Stick it in a config file, a log line, or a comment. Future you will thank present you.

from numpy.random import SeedSequence, default_rng

ss = SeedSequence(12345)
child_seeds = ss.spawn(4)  # 4 independent streams
rngs = [default_rng(s) for s in child_seeds]

Don't Seed in a Loop (Seriously)

# DON'T DO THIS
for i in range(1000):
    random.seed(42)
    x = random.random()  # x is the SAME every iteration

You'd be surprised how often this shows up in actual code. The first random number after seeding with a fixed value is a deterministic function of that seed. Seeding inside a loop with a constant seed just gives you the same "random" number over and over. Use a single seed at the top and let the generator do its thing.

When Random Isn't Random Enough: Cryptography

Sometimes "statistically random" isn't good enough — you need "your adversary, armed with a supercomputer and a copy of your algorithm, cannot predict the next bit." That's the crypto-grade bar.

`secrets` and `os.urandom()`

Python 3.6+ ships the secrets module, which wraps os.urandom() (the OS's cryptographically secure RNG — on Linux this comes from /dev/urandom and the kernel's entropy pool):

import secrets
token = secrets.token_hex(32)     # 64-char hex string
key = secrets.randbits(256)        # 256 random bits
url_safe = secrets.token_urlsafe() # For session IDs, CSRF tokens, etc.

Why MT19937 Is NOT Crypto-Safe

The Mersenne Twister is a linear recurrence. Observe 624 consecutive 32-bit outputs and you can solve for the entire internal state with linear algebra. After that you can predict every future output and reconstruct every past output. The random module even warns about this in its docs — it's not suitable for security purposes. Use secrets or os.urandom() directly.

Practical Gotchas: Stuff That Will Bite You

The `randint` Inclusive Surprise

random.randint(a, b) includes b. numpy.random.randint(a, b) in the new API (rng.integers) does NOT include b. And to make it extra confusing, numpy.random.randint in the OLD API follows the Python convention and DOES include b for some argument patterns. Read the docs for whatever API you're using. This has caused real bugs in real papers.

Slow Generation Patterns

Generating numbers one at a time in a Python loop is death by a thousand function calls. If you need a million random numbers, ask NumPy for them all at once:

# Slow (Python-loop overhead dominates):
vals = [random.random() for _ in range(1_000_000)]

# Fast (vectorized, C-level):
vals = rng.random(1_000_000)

Parallel RNG: Independent Streams or Bust

If you spawn 8 parallel processes and each seeds with the current time, you might get collisions. Worse, if you seed them all with the same seed (or with seeds 0,1,2,...7), the streams might be correlated. SeedSequence + PCG64 makes this basically foolproof — use it.

The `random.random() * N` Subtle Bias

Multiplying a uniform float by N and flooring to get an integer index can introduce tiny biases due to floating-point representation. For casual use it's fine, but if you're doing something like lottery draw simulations where every bit of uniformity counts, use randint or randrange — they use rejection sampling internally to guarantee uniformity.

Numerics/Random Numbers

From charlesreid1

Contents

The Two Kinds of Random

Pseudorandom Number Generators: Controlled Chaos

The OG: Linear Congruential Generators

The Workhorse: Mersenne Twister (MT19937)

Python's `random` Module: Batteries Included

The Core Functions You'll Actually Use

Seeding: The Ritual

NumPy's Random: When the Stakes Get Higher

The API Schism: Old Way vs. New Way

Why PCG64 Over MT19937?

A Tour Through the Distributions

The Inverse Transform Method

NumPy's Distribution Buffet

Seeds and Reproducibility: Control Your Chaos

The Golden Rules

Don't Seed in a Loop (Seriously)

When Random Isn't Random Enough: Cryptography

`secrets` and `os.urandom()`

Why MT19937 Is NOT Crypto-Safe

Practical Gotchas: Stuff That Will Bite You

The `randint` Inclusive Surprise

Slow Generation Patterns

Parallel RNG: Independent Streams or Bust

The `random.random() * N` Subtle Bias

See Also

Numerics/Random Numbers

From charlesreid1

The Two Kinds of Random

Pseudorandom Number Generators: Controlled Chaos

The OG: Linear Congruential Generators

The Workhorse: Mersenne Twister (MT19937)

Python's random Module: Batteries Included

The Core Functions You'll Actually Use

Seeding: The Ritual

NumPy's Random: When the Stakes Get Higher

The API Schism: Old Way vs. New Way

Why PCG64 Over MT19937?

A Tour Through the Distributions

The Inverse Transform Method

NumPy's Distribution Buffet

Seeds and Reproducibility: Control Your Chaos

The Golden Rules

Don't Seed in a Loop (Seriously)

When Random Isn't Random Enough: Cryptography

secrets and os.urandom()

Why MT19937 Is NOT Crypto-Safe

Practical Gotchas: Stuff That Will Bite You

The randint Inclusive Surprise

Slow Generation Patterns

Parallel RNG: Independent Streams or Bust

The random.random() * N Subtle Bias

See Also

Python's `random` Module: Batteries Included

`secrets` and `os.urandom()`

The `randint` Inclusive Surprise

The `random.random() * N` Subtle Bias