Numerics/Random Numbers
From charlesreid1
DEVELOPERS! DEVELOPERS! DEVELOPERS! — and the random numbers they need.
If you are reaching for C++ to generate random numbers, STOP. Put down that `<random>` header. Step away from the Mersenne Twister boilerplate. You are about to write 15 lines of ceremonial incantation just to get a number between 1 and 6. Python gives you that in one line. And it will be correct. And it will be readable. And you will not have to explain to your cryptographer friend why you seeded with `time(NULL)` (spoiler: that is how you get owned).
Python's random-number story is the best in the business for numerical work. It ships with three battle-tested modules — `random`, `secrets`, and `numpy.random` — and each one knows its job. This page covers all three, calls out the footguns, and shows you the one obvious way to do it.
The Holy Trinity of Python Randomness
| Module | Use Case | Speed | Cryptographic? |
|---|---|---|---|
| `random` | Simulations, games, shuffling, sampling | Fast (Mersenne Twister) | NO — never for security |
| `secrets` | Passwords, tokens, session keys, auth | Slower (OS entropy) | YES — designed for it |
| `numpy.random` | Massive arrays, Monte Carlo, stats | Vectorized, GPU-ready | NO — never for security |
Memorize this table. Getting it wrong is how you ship a broken cryptosystem or a simulation that takes 100× too long.
`random` — Your Daily Driver
The `random` module uses the Mersenne Twister (MT19937): period of $ 2^{19937} - 1 $, passes Diehard and TestU01 (mostly), and ships with CPython. It is not cryptographically secure. Do not use it for secrets. Are we clear? Good.
The One-Liners You Will Use Every Day
import random
# Integer in [a, b] — inclusive on BOTH ends
die_roll = random.randint(1, 6)
# Integer in [a, b) — exclusive upper, like range()
idx = random.randrange(0, len(my_list))
# Float in [0.0, 1.0)
u = random.random()
# Float in [a, b]
f = random.uniform(2.5, 7.5)
# Pick one
winner = random.choice(["Alice", "Bob", "Charlie", "Dana"])
# Pick k WITHOUT replacement (no duplicates)
sample = random.sample(population, k=10)
# Shuffle IN PLACE — Fisher-Yates under the hood
random.shuffle(deck)
Seeds: Control Your Chaos
random.seed(42) # Deterministic run — great for tests
random.seed() # System time + os.urandom fallback
random.seed("hello world", version=2) # Hash a string into a seed
OPINION: Always seed explicitly in scientific code. Reproducibility is not optional. If your results cannot be regenerated from a known seed, you do not have results — you have an anecdote. Save the seed alongside your output. Future you will weep with gratitude.
Distributions: Beyond Uniform
# Gaussian (mu=0, sigma=1)
random.gauss(0, 1) # Slightly faster
random.normalvariate(0, 1) # Thread-safe
# Exponentially distributed, lambda=1.5
random.expovariate(1.5)
# Gamma, Beta, von Mises, Pareto, Weibull...
random.gammavariate(alpha=2.0, beta=3.0)
random.betavariate(alpha=0.5, beta=0.5)
random.paretovariate(alpha=1.5)
random.weibullvariate(alpha=1.5, beta=2.0)
The Trap: `SystemRandom` (Just Use `secrets` Instead)
# Exists, works, but WHY ARE YOU DOING THIS?
sr = random.SystemRandom()
token = sr.randrange(2**256)
`SystemRandom` wraps `/dev/urandom` through the `random.Random` API. Fine, it works. But Python 3.6 gave us `secrets`, which has a purpose-built API for exactly this. Use the right tool.
`secrets` — When Random Isn't Just Random
If the outcome affects money, privacy, authentication, or user safety, you are in `secrets` territory. This module pulls entropy directly from the operating system's CSPRNG (`/dev/urandom` on Linux, `CryptGenRandom` on Windows) and the API is deliberately narrow — you cannot accidentally use it for Monte Carlo.
import secrets
# Cryptographically random integer in [0, n)
secrets.randbelow(2**256)
# Random integer with k random bits
secrets.randbits(256)
# Token — URL-safe Base64, nbytes → ceil(nbytes * 4/3) characters
secrets.token_hex(32) # 64 hex chars
secrets.token_urlsafe(32) # ~43 URL-safe chars
# Pick one — constant-time-ish choice
secrets.choice(["primary", "secondary", "fallback"])
OPINION: If you type `random` when you should have typed `secrets`, you have introduced a vulnerability. Full stop. The Mersenne Twister is predictable after 624 consecutive outputs. That is not a theoretical attack — it is a Saturday-afternoon script. Use `secrets` for anything adversarial.
`numpy.random` — The Heavy Artillery
When you need millions of random numbers, Python's `random` module becomes a bottleneck. Each call crosses the Python/C boundary. NumPy's `numpy.random` generates entire arrays in C, vectorized, and the new API (1.17+) uses the superior PCG-64 generator by default.
The New API (Use This, Not the Old One)
import numpy as np
# Create a generator — PCG64 is the new default
rng = np.random.default_rng(seed=2024)
# Uniform [0, 1) — 10 million floats in < 100 ms
u = rng.random(10_000_000)
# Integers
dice = rng.integers(1, 7, size=1000, endpoint=True) # d6 × 1000
# Normal, vectorized
z = rng.standard_normal((1000, 1000)) # 1M standard normals
# Shuffle along axis
rng.shuffle(arr, axis=0)
# Choice with replacement and probabilities
rng.choice(["H", "T"], size=1000, p=[0.5, 0.5])
The Old API (Detect It, Then Kill It)
# LEGACY: Global state, unpredictable seeding, slower PCG
np.random.seed(42)
np.random.rand(100) # Uniform
np.random.randn(100) # Normal
np.random.randint(0, 10, 100)
# EVERY call after np.random.seed() is a footgun in threaded code.
# Use np.random.default_rng() instead.
OPINION: `np.random.seed()` is a code smell in 2024. The global RandomState is shared across all threads, libraries, and modules that import NumPy. If any of them call `np.random.seed()` or draw from the global state, your "deterministic" run is silently corrupted. The `Generator` API (`default_rng`) gives each component its own isolated stream. Use it.
Generators: Pick Your Poison
from numpy.random import PCG64, Philox, SFC64, MT19937
# PCG64 — default, excellent all-rounder, tiny state (128 bits)
rng = np.random.Generator(PCG64(seed=42))
# Philox — counter-based, parallel- and GPU-friendly
rng = np.random.Generator(Philox(seed=42))
# SFC64 — fastest, small state, good statistical quality
rng = np.random.Generator(SFC64(seed=42))
# MT19937 — the old warhorse. For compatibility only.
rng = np.random.Generator(MT19937(seed=42))
The Laws of Random Number Hygiene
Law 1: Seeds Are Sacred
Log your seed. Better yet, log the entire RNG state if you checkpoint. If you cannot replay your simulation bit-for-bit, your paper is a PDF full of hopes and feelings.
Law 2: `random` ≠ `secrets`
Print this on a sticky note and attach it to your monitor:
`random` is for dice. `secrets` is for keys.
If your code generates a session ID, password reset token, or API key with `random`, delete it and start over.
Law 3: Vectorize or Die
Generating 10 million random numbers in a Python `for` loop calling `random.random()` is the computational equivalent of eating soup with a fork. Use `numpy.random.Generator.random(10_000_000)`. It will be 50–200× faster and your CPU will not file a grievance.
Law 4: Never Seed With System Time in a Loop
Seeding with `time.time()` inside a tight loop (we see this in the wild) produces identical "random" sequences every iteration because the clock hasn't ticked. This is not a subtle bug — it is a disaster with a straight face. If you must reseed quickly, use `secrets.randbits(128)` as the seed.
Law 5: Beware the Birthday Paradox
You need only $ \sqrt{n} $ samples before collisions appear. For 32-bit random IDs, that is ~77,000. If you are generating IDs with `random.getrandbits(32)`, expect a duplicate by row 65,536. Use 128-bit tokens (`secrets.token_hex(16)`) and the collision probability becomes cosmically negligible.
Common Patterns, Done Right
Pattern: Monte Carlo Integration
import numpy as np
def estimate_pi(n: int, rng=None) -> float:
"""Estimate π by throwing darts at a unit square."""
if rng is None:
rng = np.random.default_rng()
x = rng.random(n)
y = rng.random(n)
inside = np.sum(x**2 + y**2 <= 1.0)
return 4.0 * inside / n
rng = np.random.default_rng(seed=42)
print(estimate_pi(10_000_000, rng=rng)) # 3.1415...
Pattern: Reservoir Sampling (Streaming Data)
import random
def reservoir_sample(stream, k: int, rng=None):
"""Reservoir-sample k items from a streaming iterable."""
if rng is None:
rng = random.Random()
reservoir = []
for i, item in enumerate(stream):
if i < k:
reservoir.append(item)
else:
j = rng.randrange(i + 1)
if j < k:
reservoir[j] = item
return reservoir
Pattern: Cryptographic Salt / Token
import secrets
def make_session_id() -> str:
"""256-bit session ID, URL-safe. ~43 characters."""
return secrets.token_urlsafe(32)
def make_api_key() -> str:
"""Hex-encoded 256-bit API key. 64 characters."""
return secrets.token_hex(32)
Pattern: Train/Test Split (Reproducible)
import numpy as np
rng = np.random.default_rng(seed=8675309)
indices = rng.permutation(len(data))
split = int(0.8 * len(data))
train_idx, test_idx = indices[:split], indices[split:]
What About `os.urandom`?
`os.urandom(n)` is the bedrock — it returns n bytes from the OS CSPRNG. `secrets` is a thin, opinionated wrapper around it. Use `secrets` for structured randomness (tokens, integers, choices). Use `os.urandom` directly only when you need raw bytes or are building your own crypto primitives (and if you are doing that, you already know why you are here).
What About C++'s `<random>`?
Look, I have written C++. I have loved C++. But generating a random integer in modern C++ looks like this:
#include <random>
#include <iostream>
int main() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dist(1, 6);
std::cout << dist(gen) << '\n';
}
Python:
import random
print(random.randint(1, 6))
The C++ version is five lines. It pulls in three headers. It instantiates three objects from three different classes. For a die roll. And every single one of those lines has a sharp edge: `std::random_device` can be deterministic on MinGW. `std::mt19937` produces biased results if you use modulo instead of `uniform_int_distribution`. The boilerplate-to-value ratio is off the charts.
Use Python. Your numerical code will be shorter, correcter, and you will finish before lunch.
See Also
- Numerics — the full numerical recipes catalog
- Numerics/Monte Carlo — Monte Carlo methods done right
- Numerics/Statistical Descriptions of Data — descriptive statistics
- Numerics/Classification and Inference — Bayesian and frequentist inference
- Python `random` documentation
- Python `secrets` documentation
- NumPy random documentation
- PCG Random — the PCG family explained