Numerics/Random Numbers: Difference between revisions
From charlesreid1
(Create opinionated, Python-centric random numbers page (via create-page on MediaWiki MCP Server)) |
(BALLMER MODE ACTIVATED! DEVELOPERS! DEVELOPERS! DEVELOPERS! RANDOM! RANDOM! RANDOM! (via update-page on MediaWiki MCP Server)) |
||
| Line 1: | Line 1: | ||
'''DEVELOPERS! DEVELOPERS! DEVELOPERS!''' | '''DEVELOPERS! DEVELOPERS! DEVELOPERS! DEVELOPERS!''' | ||
''(sweats profusely, paces the stage like a caged animal, grabs the microphone with both hands)'' | |||
'''RANDOM! NUMBERS! RANDOM! NUMBERS! RANDOM! NUMBERS!''' | |||
'''I! LOVE! THIS! WIKI!''' | |||
''(deep breath, veins bulging on forehead)'' | |||
Let me tell you about RANDOM NUMBERS, people. Let me tell you about the GLORY. The SHEER. UNBRIDLED. GLORY. of generating entropy in Python. This is not just a module. This is a LIFESTYLE. This is a COMMITMENT. This is the sound of ONE MILLION DICE hitting the table ALL AT ONCE. | |||
If you are reaching for C++ to generate random numbers — '''STOOOOOOP!''' I'm going to throw a CHAIR across the room. I am PHYSICALLY going to pick up a chair and THROW it. Put down that <code><random></code> header! Step AWAY from the Mersenne Twister boilerplate! You are about to write FIFTEEN LINES of ceremonial incantation — FIFTEEN! — just to get a number between ONE and SIX. ONE! AND! SIX! Python gives you that in '''ONE! LINE!''' And it will be CORRECT! And it will be READABLE! And you will not have to explain to your cryptographer friend why you seeded with <code>time(NULL)</code> — ''(sweats)'' — because that is how you get OWNED! OWNED! OWNED! | |||
Python's random-number story is THE BEST IN THE BUSINESS! THE! BEST! IN! THE! BUSINESS! It ships with THREE — count them — THREE battle-tested modules: <code>random</code>! <code>secrets</code>! <code>numpy.random</code>! And each one! Knows! Its! JOB! This page covers ALL THREE, calls out the FOOTGUNS — ''(throws chair)'' — and shows you the ONE OBVIOUS WAY to do it! | |||
== THE HOLY TRINITY OF PYTHON RANDOMNESS! == | |||
''(paces left, paces right, sweat flying in every direction)'' | |||
{| class="wikitable" style="width:100%" | {| class="wikitable" style="width:100%" | ||
|- | |- | ||
! | ! MODULE !! USE CASE !! SPEED !! CRYPTOGRAPHIC? | ||
|- | |- | ||
| | | <code>random</code> || SIMULATIONS! GAMES! SHUFFLING! SAMPLING! || '''FAST!''' (Mersenne Twister, baby!) || '''NO! NO! NO!''' Never for security! NEVER! | ||
|- | |- | ||
| | | <code>secrets</code> || PASSWORDS! TOKENS! SESSION KEYS! AUTH! || Slower (OS entropy — deal with it) || '''YES! YES! YES!''' Built for exactly this! | ||
|- | |- | ||
| | | <code>numpy.random</code> || MASSIVE ARRAYS! MONTE CARLO! STATS! || '''VECTORIZED! GPU-READY! SCREAMS!''' || '''NO! NO! NO!''' Not for security! | ||
|} | |} | ||
'''MEMORIZE! THIS! TABLE!''' ''(slams fist on podium)'' Getting it wrong is how you ship a BROKEN cryptosystem. Getting it wrong is how your simulation takes 100× too long. Getting it wrong is how you end up on the front page of Hacker News for all the WRONG reasons! | |||
== | == <code>random</code> — YOUR! DAILY! DRIVER! == | ||
The | The <code>random</code> module uses the Mersenne Twister — MT19937, baby! Period of <math>2^{19937} - 1</math> — that's a two with NINETEEN THOUSAND digits after it! Passes Diehard! Passes TestU01! Ships with CPython! It is NOT cryptographically secure. Do NOT use it for secrets. '''ARE! WE! CLEAR! GOOD!''' | ||
=== | === THE ONE-LINERS YOU WILL USE EVERY SINGLE DAY! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import random | import random # IMPORT! IMPORT! IMPORT! | ||
# Integer in [a, b] — | # Integer in [a, b] — INCLUSIVE ON BOTH ENDS, BABY! | ||
die_roll = random.randint(1, 6) | die_roll = random.randint(1, 6) # ONE! LINE! DICE! | ||
# Integer in [a, b) — exclusive upper, like range() | # Integer in [a, b) — exclusive upper, like range() | ||
idx = random.randrange(0, len(my_list)) | idx = random.randrange(0, len(my_list)) # PICK! AN! INDEX! | ||
# Float in [0.0, 1.0) | # Float in [0.0, 1.0) — THE CLASSIC! | ||
u = random.random() | u = random.random() # RANDOM! RANDOM! RANDOM! | ||
# Float in [a, b] | # Float in [a, b] — FLOATS! FLOATS! FLOATS! | ||
f = random.uniform(2.5, 7.5) | f = random.uniform(2.5, 7.5) # UNIFORM! UNIFORM! | ||
# Pick | # Pick ONE! Pick ONE! Pick ONE! | ||
winner = random.choice(["Alice", "Bob", "Charlie", "Dana"]) | winner = random.choice(["Alice", "Bob", "Charlie", "Dana"]) | ||
# Pick k WITHOUT replacement | # Pick k WITHOUT replacement — no duplicates, no excuses! | ||
sample = random.sample(population, k=10) | sample = random.sample(population, k=10) | ||
# Shuffle IN PLACE — Fisher-Yates under the hood | # Shuffle IN PLACE — Fisher-Yates under the hood, baby! | ||
random.shuffle(deck) | random.shuffle(deck) # SHUFFLE! SHUFFLE! SHUFFLE! | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== | === SEEDS! CONTROL! YOUR! CHAOS! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
random.seed(42) # | random.seed(42) # DETERMINISTIC! Reproducible! TESTS! | ||
random.seed() # System time + os.urandom fallback | random.seed() # System time + os.urandom fallback | ||
random.seed("hello world", version=2) # | random.seed("hello world", version=2) # HASH! THAT! STRING! | ||
</syntaxhighlight> | </syntaxhighlight> | ||
'''OPINION | '''OPINION! OPINION! OPINION!''' Always seed explicitly in scientific code. '''REPRODUCIBILITY IS NOT OPTIONAL!''' If your results cannot be regenerated from a known seed, you do not have results — you have an ANECDOTE! Save the seed alongside your output. Future you will WEEP with GRATITUDE! WEEP! WITH! GRATITUDE! | ||
=== | === DISTRIBUTIONS! BEYOND! UNIFORM! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
# Gaussian (mu=0, sigma=1) | # Gaussian (mu=0, sigma=1) — THE BELL CURVE! | ||
random.gauss(0, 1) # | random.gauss(0, 1) # SLIGHTLY! FASTER! | ||
random.normalvariate(0, 1) # | random.normalvariate(0, 1) # THREAD! SAFE! | ||
# Exponentially distributed, lambda=1.5 | # Exponentially distributed, lambda=1.5 — POISSON'S COUSIN! | ||
random.expovariate(1.5) | random.expovariate(1.5) | ||
# Gamma | # Gamma! Beta! Von Mises! Pareto! Weibull! — WE'VE GOT THEM ALL! | ||
random.gammavariate(alpha=2.0, beta=3.0) | random.gammavariate(alpha=2.0, beta=3.0) | ||
random.betavariate(alpha=0.5, beta=0.5) | random.betavariate(alpha=0.5, beta=0.5) | ||
| Line 78: | Line 90: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== | === THE TRAP! <code>SystemRandom</code> — JUST! USE! <code>secrets</code>! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
# Exists | # Exists. Works. But WHY! ARE! YOU! DOING! THIS! | ||
sr = random.SystemRandom() | sr = random.SystemRandom() | ||
token = sr.randrange(2**256) | token = sr.randrange(2**256) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
<code>SystemRandom</code> wraps <code>/dev/urandom</code> through the <code>random.Random</code> API. Fine! It works! ''(paces aggressively)'' But Python 3.6 gave us <code>secrets</code>, which has a PURPOSE-BUILT API for exactly this! USE! THE! RIGHT! TOOL! USE IT! USE IT! USE IT! | |||
== <code>secrets</code> — WHEN RANDOM ISN'T JUST RANDOM! == | |||
''(stops pacing, stares directly into the audience, voice drops to a gravelly whisper)'' | |||
If the outcome affects ''' | If the outcome affects '''MONEY! PRIVACY! AUTHENTICATION! USER SAFETY!''' — you are in <code>secrets</code> territory, people. This module pulls entropy DIRECTLY from the operating system's CSPRNG — <code>/dev/urandom</code> on Linux, <code>CryptGenRandom</code> on Windows — and the API is DELIBERATELY NARROW. You CANNOT accidentally use it for Monte Carlo. You CANNOT. IT WILL NOT LET YOU! | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import secrets | import secrets # SECRETS! SECRETS! SECRETS! | ||
# Cryptographically random integer in [0, n) | # Cryptographically random integer in [0, n) — BELOW! BELOW! BELOW! | ||
secrets.randbelow(2**256) | secrets.randbelow(2**256) # TWO TO THE TWO FIFTY SIX! | ||
# Random integer with k random bits | # Random integer with k random bits — BITS! BITS! BITS! | ||
secrets.randbits(256) | secrets.randbits(256) # GIVE! ME! TWO! HUNDRED! FIFTY! SIX! BITS! | ||
# Token — URL-safe Base64, nbytes → ceil(nbytes * 4/3) characters | # Token — URL-safe Base64, nbytes → ceil(nbytes * 4/3) characters | ||
secrets.token_hex(32) # 64 hex chars | secrets.token_hex(32) # 64 hex chars — HEX! HEX! HEX! | ||
secrets.token_urlsafe(32) # ~43 URL-safe chars | secrets.token_urlsafe(32) # ~43 URL-safe chars — URL! SAFE! URL! SAFE! | ||
# Pick one — constant-time-ish choice | # Pick one — constant-time-ish choice — PICK! PICK! PICK! | ||
secrets.choice(["primary", "secondary", "fallback"]) | secrets.choice(["primary", "secondary", "fallback"]) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
'''OPINION | '''OPINION! OPINION! OPINION!''' If you type <code>random</code> when you should have typed <code>secrets</code>, you have introduced a VULNERABILITY! '''FULL! STOP!''' The Mersenne Twister is PREDICTABLE after 624 consecutive outputs. That is not a theoretical attack — that is a SATURDAY-AFTERNOON SCRIPT! A SATURDAY! AFTERNOON! SCRIPT! Use <code>secrets</code> for anything adversarial. '''USE! IT!''' | ||
== <code>numpy.random</code> — THE! HEAVY! ARTILLERY! == | |||
''(sweat intensity increases 400%)'' | |||
When you need ''' | When you need '''MILLIONS!''' of random numbers — Python's <code>random</code> module becomes a BOTTLENECK! Each call crosses the Python/C boundary! EACH! CALL! NumPy's <code>numpy.random</code> generates ENTIRE ARRAYS in C — VECTORIZED! — and the new API (1.17+) uses the SUPERIOR PCG-64 generator by default! PCG! PCG! PCG! | ||
=== | === THE NEW API — USE THIS! NOT! THE! OLD! ONE! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import numpy as np | import numpy as np # IMPORT! IMPORT! | ||
# Create a generator — PCG64 is the new default | # Create a generator — PCG64 is the new default, BABY! | ||
rng = np.random.default_rng(seed=2024) | rng = np.random.default_rng(seed=2024) # DEFAULT! DEFAULT! DEFAULT! | ||
# Uniform [0, 1) — 10 | # Uniform [0, 1) — 10 MILLION FLOATS in < 100 ms! TEN! MILLION! | ||
u = rng.random(10_000_000) | u = rng.random(10_000_000) # THAT'S! TEN! MILLION! | ||
# Integers | # Integers — INCLUSIVE ENDPOINT! | ||
dice = rng.integers(1, 7, size=1000, endpoint=True) # d6 × 1000 | dice = rng.integers(1, 7, size=1000, endpoint=True) # d6 × 1000! | ||
# Normal, vectorized | # Normal, vectorized — ONE! MILLION! NORMALS! | ||
z = rng.standard_normal((1000, 1000)) # | z = rng.standard_normal((1000, 1000)) # VECTORIZED! VECTORIZED! | ||
# Shuffle along axis | # Shuffle along axis — SHUFFLE! THE! ARRAY! | ||
rng.shuffle(arr, axis=0) | rng.shuffle(arr, axis=0) | ||
# Choice with replacement | # Choice with replacement AND probabilities — CHOOSE! CHOOSE! | ||
rng.choice(["H", "T"], size=1000, p=[0.5, 0.5]) | rng.choice(["H", "T"], size=1000, p=[0.5, 0.5]) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== | === THE OLD API — DETECT IT! THEN! KILL! IT! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
# LEGACY | # LEGACY! LEGACY! LEGACY! Global state, unpredictable seeding, slower PCG! | ||
np.random.seed(42) | np.random.seed(42) # DON'T! DON'T! DON'T! | ||
np.random.rand(100) # Uniform | np.random.rand(100) # Uniform from the GLOBAL STATE! | ||
np.random.randn(100) # Normal | np.random.randn(100) # Normal from the GLOBAL STATE! | ||
np.random.randint(0, 10, 100) | np.random.randint(0, 10, 100) # Integer from the GLOBAL STATE! | ||
# EVERY call after np.random.seed() is a | # EVERY call after np.random.seed() is a FOOTGUN in threaded code! | ||
# Use np.random.default_rng() instead | # A FOOTGUN! IN THREADED CODE! | ||
# Use np.random.default_rng() instead! USE IT! | |||
</syntaxhighlight> | </syntaxhighlight> | ||
'''OPINION | '''OPINION! OPINION! OPINION!''' <code>np.random.seed()</code> is a CODE SMELL in the year of our lord twenty twenty-four! ''(throws another chair)'' The global <code>RandomState</code> is SHARED across ALL threads, ALL libraries, ALL modules that import NumPy. If ANY of them call <code>np.random.seed()</code> or draw from the global state, your "deterministic" run is SILENTLY! CORRUPTED! SILENTLY! CORRUPTED! The <code>Generator</code> API — <code>default_rng</code> — gives each component its own ISOLATED STREAM! ISOLATED! USE! IT! | ||
=== | === GENERATORS! PICK! YOUR! POISON! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
from numpy.random import PCG64, Philox, SFC64, MT19937 | from numpy.random import PCG64, Philox, SFC64, MT19937 | ||
# PCG64 — | # PCG64 — DEFAULT! EXCELLENT! ALL-ROUNDER! TINY STATE! 128 BITS! | ||
rng = np.random.Generator(PCG64(seed=42)) | rng = np.random.Generator(PCG64(seed=42)) | ||
# Philox — | # Philox — COUNTER-BASED! PARALLEL! GPU-FRIENDLY! | ||
rng = np.random.Generator(Philox(seed=42)) | rng = np.random.Generator(Philox(seed=42)) | ||
# SFC64 — | # SFC64 — FASTEST! SMALL STATE! GOOD STATISTICAL QUALITY! | ||
rng = np.random.Generator(SFC64(seed=42)) | rng = np.random.Generator(SFC64(seed=42)) | ||
# MT19937 — | # MT19937 — THE OLD WARHORSE! COMPATIBILITY! ONLY! | ||
rng = np.random.Generator(MT19937(seed=42)) | rng = np.random.Generator(MT19937(seed=42)) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== | == THE LAWS OF RANDOM NUMBER HYGIENE! == | ||
''(grips podium with both hands, knuckles white, voice at absolute maximum volume)'' | |||
=== LAW 1! SEEDS! ARE! SACRED! === | |||
Log your seed! Better yet, log the ENTIRE RNG STATE if you checkpoint! If you cannot replay your simulation BIT-FOR-BIT, your paper is a PDF full of HOPES! AND! FEELINGS! HOPES! AND! FEELINGS! | |||
=== LAW 2! <code>random</code> ≠ <code>secrets</code>! === | |||
Print this on a STICKY NOTE and ATTACH it to your MONITOR! Print it on your FOREHEAD! Tattoo it on your ARM! | |||
<code>random</code> is for DICE! DICE! DICE! <code>secrets</code> is for KEYS! KEYS! KEYS! | |||
If your code generates a session ID, password reset token, or API key with <code>random</code> — DELETE! IT! AND! START! OVER! DELETE IT! START OVER! | |||
=== LAW 3! VECTORIZE! OR! DIE! === | |||
Generating 10 million random numbers in a Python <code>for</code> loop calling <code>random.random()</code> is the computational equivalent of EATING SOUP WITH A FORK! Use <code>numpy.random.Generator.random(10_000_000)</code>. It will be 50–200× FASTER! FIFTY TO TWO HUNDRED TIMES FASTER! And your CPU will not file a GRIEVANCE! THE CPU! WILL NOT! FILE! A GRIEVANCE! | |||
=== LAW 4! NEVER! SEED! WITH! SYSTEM! TIME! IN! A! LOOP! === | |||
Seeding with <code>time.time()</code> inside a tight loop — ''(we see this in the wild, people — we SEE it)'' — produces IDENTICAL "random" sequences every iteration because the CLOCK HASN'T TICKED! THE! CLOCK! HASN'T! TICKED! This is not a subtle bug — it is a DISASTER with a STRAIGHT FACE! If you must reseed quickly, use <code>secrets.randbits(128)</code> as the seed! SECRETS! NOT! TIME! | |||
=== LAW 5! BEWARE! THE! BIRTHDAY! PARADOX! === | |||
You need only <math>\sqrt{n}</math> samples before COLLISIONS appear! COLLISIONS! For 32-bit random IDs, that is ~77,000! SEVENTY-SEVEN THOUSAND! If you are generating IDs with <code>random.getrandbits(32)</code>, expect a duplicate by row 65,536! SIXTY-FIVE THOUSAND! Use 128-bit tokens — <code>secrets.token_hex(16)</code> — and the collision probability becomes COSMICALLY! NEGLIGIBLE! COSMICALLY! NEGLIGIBLE! | |||
=== | == COMMON PATTERNS! DONE! RIGHT! == | ||
=== PATTERN! MONTE! CARLO! INTEGRATION! === | |||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import numpy as np | import numpy as np # NUMPY! NUMPY! NUMPY! | ||
def estimate_pi(n: int, rng=None) -> float: | def estimate_pi(n: int, rng=None) -> float: | ||
"""Estimate π by throwing | """Estimate π by throwing DARTS at a UNIT SQUARE!""" | ||
if rng is None: | if rng is None: | ||
rng = np.random.default_rng() | rng = np.random.default_rng() # DEFAULT! DEFAULT! | ||
x = rng.random(n) | x = rng.random(n) # N DARTS! X AXIS! | ||
y = rng.random(n) | y = rng.random(n) # N DARTS! Y AXIS! | ||
inside = np.sum(x**2 + y**2 <= 1.0) | inside = np.sum(x**2 + y**2 <= 1.0) # INSIDE THE CIRCLE! | ||
return 4.0 * inside / n | return 4.0 * inside / n # PI! PI! PI! | ||
rng = np.random.default_rng(seed=42) | rng = np.random.default_rng(seed=42) | ||
print(estimate_pi(10_000_000, rng=rng)) # 3.1415... | print(estimate_pi(10_000_000, rng=rng)) # 3.1415... LOOK AT IT! LOOK AT PI! | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== | === PATTERN! RESERVOIR! SAMPLING! STREAMING! DATA! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import random | import random # IMPORT! IMPORT! | ||
def reservoir_sample(stream, k: int, rng=None): | def reservoir_sample(stream, k: int, rng=None): | ||
"""Reservoir-sample k items from a | """Reservoir-sample k items from a STREAMING ITERABLE!""" | ||
if rng is None: | if rng is None: | ||
rng = random.Random() | rng = random.Random() # MY! OWN! RNG! | ||
reservoir = [] | reservoir = [] | ||
for i, item in enumerate(stream): | for i, item in enumerate(stream): | ||
if i < k: | if i < k: | ||
reservoir.append(item) | reservoir.append(item) # FILL! THE! RESERVOIR! | ||
else: | else: | ||
j = rng.randrange(i + 1) | j = rng.randrange(i + 1) | ||
if j < k: | if j < k: | ||
reservoir[j] = item | reservoir[j] = item # REPLACE! REPLACE! REPLACE! | ||
return reservoir | return reservoir | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== | === PATTERN! CRYPTOGRAPHIC! SALT! TOKEN! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import secrets | import secrets # SECRETS! SECRETS! SECRETS! | ||
def make_session_id() -> str: | def make_session_id() -> str: | ||
"""256-bit session ID, URL-safe. ~43 characters.""" | """256-bit session ID, URL-safe. ~43 characters. SECURE! SECURE!""" | ||
return secrets.token_urlsafe(32) | return secrets.token_urlsafe(32) | ||
def make_api_key() -> str: | def make_api_key() -> str: | ||
"""Hex-encoded 256-bit API key. 64 characters.""" | """Hex-encoded 256-bit API key. 64 characters. KEYS! KEYS! KEYS!""" | ||
return secrets.token_hex(32) | return secrets.token_hex(32) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== | === PATTERN! TRAIN! TEST! SPLIT! REPRODUCIBLE! === | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import numpy as np | import numpy as np | ||
rng = np.random.default_rng(seed=8675309) | rng = np.random.default_rng(seed=8675309) # JENNY! JENNY! JENNY! | ||
indices = rng.permutation(len(data)) | indices = rng.permutation(len(data)) # PERMUTE! PERMUTE! | ||
split = int(0.8 * len(data)) | split = int(0.8 * len(data)) # EIGHTY! TWENTY! | ||
train_idx, test_idx = indices[:split], indices[split:] | train_idx, test_idx = indices[:split], indices[split:] # SPLIT! SPLIT! | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== | == WHAT ABOUT <code>os.urandom</code>?! == | ||
<code>os.urandom(n)</code> is the BEDROCK, people! ''(pounds podium)'' It returns ''n'' bytes from the OS CSPRNG! <code>secrets</code> is a THIN, OPINIONATED WRAPPER around it! A THIN! WRAPPER! Use <code>secrets</code> for structured randomness — tokens, integers, choices! Use <code>os.urandom</code> directly ONLY when you need raw bytes or are building your own crypto primitives — and if you are doing that, you ALREADY KNOW WHY YOU ARE HERE! | |||
== WHAT ABOUT C++'s <code><random></code>?! == | |||
''(deep breath — the BIG one is coming)'' | |||
Look | Look. I have written C++. I have LOVED C++. '''I! HAVE! LOVED! C++!''' But generating a random integer in modern C++ looks like THIS: | ||
<syntaxhighlight lang="cpp"> | <syntaxhighlight lang="cpp"> | ||
#include <random> | #include <random> // ONE! HEADER! | ||
#include <iostream> | #include <iostream> // TWO! HEADERS! | ||
int main() { | int main() { | ||
std::random_device rd; | std::random_device rd; // OBJECT! NUMBER! ONE! | ||
std::mt19937 gen(rd()); | std::mt19937 gen(rd()); // OBJECT! NUMBER! TWO! | ||
std::uniform_int_distribution<> dist(1, 6); | std::uniform_int_distribution<> dist(1, 6); // OBJECT! NUMBER! THREE! | ||
std::cout << dist(gen) << '\n'; | std::cout << dist(gen) << '\n'; // FINALLY! A! NUMBER! | ||
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
PYTHON!: | |||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import random | import random # ONE! IMPORT! | ||
print(random.randint(1, 6)) | print(random.randint(1, 6)) # ONE! LINE! ONE! LINE! ONE! LINE! | ||
</syntaxhighlight> | </syntaxhighlight> | ||
The C++ version is | The C++ version is FIVE! LINES! FIVE! It pulls in TWO! HEADERS! It instantiates THREE! OBJECTS! From THREE! DIFFERENT! CLASSES! '''FOR! A! DIE! ROLL!''' ''(sweat dripping onto the keyboard)'' And EVERY! SINGLE! ONE! of those lines has a sharp edge: <code>std::random_device</code> can be DETERMINISTIC on MinGW! <code>std::mt19937</code> produces BIASED RESULTS if you use modulo instead of <code>uniform_int_distribution</code>! The BOILERPLATE-TO-VALUE RATIO IS OFF! THE! CHARTS! OFF! THE! CHARTS! | ||
'''USE! PYTHON!''' Your numerical code will be SHORTER! CORRECTER! And you will finish BEFORE LUNCH! BEFORE! LUNCH! | |||
''(collapses in sweaty heap, one fist still raised triumphantly in the air)'' | |||
'''PYTHON! PYTHON! PYTHON! PYTHON! RANDOM! RANDOM! RANDOM! RANDOM!''' | |||
''' | '''DEVELOPERS! DEVELOPERS! DEVELOPERS! DEVELOPERS!''' | ||
== | == SEE ALSO == | ||
* [[Numerics]] — the | * [[Numerics]] — the FULL! NUMERICAL! RECIPES! CATALOG! | ||
* [[Numerics/Monte Carlo]] — Monte Carlo methods | * [[Numerics/Monte Carlo]] — Monte Carlo methods DONE! RIGHT! | ||
* [[Numerics/Statistical Descriptions of Data]] — | * [[Numerics/Statistical Descriptions of Data]] — DESCRIPTIVE! STATISTICS! | ||
* [[Numerics/Classification and Inference]] — Bayesian | * [[Numerics/Classification and Inference]] — Bayesian AND frequentist INFERENCE! | ||
* [https://docs.python.org/3/library/random.html Python | * [https://docs.python.org/3/library/random.html Python <code>random</code> documentation] — READ! THE! DOCS! | ||
* [https://docs.python.org/3/library/secrets.html Python | * [https://docs.python.org/3/library/secrets.html Python <code>secrets</code> documentation] — READ! THEM! | ||
* [https://numpy.org/doc/stable/reference/random/index.html NumPy random documentation] | * [https://numpy.org/doc/stable/reference/random/index.html NumPy random documentation] — READ! THEM! TOO! | ||
* [https://www.pcg-random.org/ PCG Random — the PCG family explained] | * [https://www.pcg-random.org/ PCG Random — the PCG family explained] — PCG! PCG! PCG! | ||
[[Category:Numerics]] | [[Category:Numerics]] | ||
[[Category:Python]] | [[Category:Python]] | ||
[[Category:Random]] | [[Category:Random]] | ||
'''DEVELOPERS! DEVELOPERS! DEVELOPERS! DEVELOPERS!''' | |||
''(faints)'' | |||
Revision as of 01:09, 24 June 2026
DEVELOPERS! DEVELOPERS! DEVELOPERS! DEVELOPERS!
(sweats profusely, paces the stage like a caged animal, grabs the microphone with both hands)
RANDOM! NUMBERS! RANDOM! NUMBERS! RANDOM! NUMBERS!
I! LOVE! THIS! WIKI!
(deep breath, veins bulging on forehead)
Let me tell you about RANDOM NUMBERS, people. Let me tell you about the GLORY. The SHEER. UNBRIDLED. GLORY. of generating entropy in Python. This is not just a module. This is a LIFESTYLE. This is a COMMITMENT. This is the sound of ONE MILLION DICE hitting the table ALL AT ONCE.
If you are reaching for C++ to generate random numbers — STOOOOOOP! I'm going to throw a CHAIR across the room. I am PHYSICALLY going to pick up a chair and THROW it. Put down that <random> header! Step AWAY from the Mersenne Twister boilerplate! You are about to write FIFTEEN LINES of ceremonial incantation — FIFTEEN! — just to get a number between ONE and SIX. ONE! AND! SIX! Python gives you that in ONE! LINE! And it will be CORRECT! And it will be READABLE! And you will not have to explain to your cryptographer friend why you seeded with time(NULL) — (sweats) — because that is how you get OWNED! OWNED! OWNED!
Python's random-number story is THE BEST IN THE BUSINESS! THE! BEST! IN! THE! BUSINESS! It ships with THREE — count them — THREE battle-tested modules: random! secrets! numpy.random! And each one! Knows! Its! JOB! This page covers ALL THREE, calls out the FOOTGUNS — (throws chair) — and shows you the ONE OBVIOUS WAY to do it!
THE HOLY TRINITY OF PYTHON RANDOMNESS!
(paces left, paces right, sweat flying in every direction)
| MODULE | USE CASE | SPEED | CRYPTOGRAPHIC? |
|---|---|---|---|
random |
SIMULATIONS! GAMES! SHUFFLING! SAMPLING! | FAST! (Mersenne Twister, baby!) | NO! NO! NO! Never for security! NEVER! |
secrets |
PASSWORDS! TOKENS! SESSION KEYS! AUTH! | Slower (OS entropy — deal with it) | YES! YES! YES! Built for exactly this! |
numpy.random |
MASSIVE ARRAYS! MONTE CARLO! STATS! | VECTORIZED! GPU-READY! SCREAMS! | NO! NO! NO! Not for security! |
MEMORIZE! THIS! TABLE! (slams fist on podium) Getting it wrong is how you ship a BROKEN cryptosystem. Getting it wrong is how your simulation takes 100× too long. Getting it wrong is how you end up on the front page of Hacker News for all the WRONG reasons!
random — YOUR! DAILY! DRIVER!
The random module uses the Mersenne Twister — MT19937, baby! Period of $ 2^{19937} - 1 $ — that's a two with NINETEEN THOUSAND digits after it! Passes Diehard! Passes TestU01! Ships with CPython! It is NOT cryptographically secure. Do NOT use it for secrets. ARE! WE! CLEAR! GOOD!
THE ONE-LINERS YOU WILL USE EVERY SINGLE DAY!
import random # IMPORT! IMPORT! IMPORT!
# Integer in [a, b] — INCLUSIVE ON BOTH ENDS, BABY!
die_roll = random.randint(1, 6) # ONE! LINE! DICE!
# Integer in [a, b) — exclusive upper, like range()
idx = random.randrange(0, len(my_list)) # PICK! AN! INDEX!
# Float in [0.0, 1.0) — THE CLASSIC!
u = random.random() # RANDOM! RANDOM! RANDOM!
# Float in [a, b] — FLOATS! FLOATS! FLOATS!
f = random.uniform(2.5, 7.5) # UNIFORM! UNIFORM!
# Pick ONE! Pick ONE! Pick ONE!
winner = random.choice(["Alice", "Bob", "Charlie", "Dana"])
# Pick k WITHOUT replacement — no duplicates, no excuses!
sample = random.sample(population, k=10)
# Shuffle IN PLACE — Fisher-Yates under the hood, baby!
random.shuffle(deck) # SHUFFLE! SHUFFLE! SHUFFLE!
SEEDS! CONTROL! YOUR! CHAOS!
random.seed(42) # DETERMINISTIC! Reproducible! TESTS!
random.seed() # System time + os.urandom fallback
random.seed("hello world", version=2) # HASH! THAT! STRING!
OPINION! OPINION! OPINION! Always seed explicitly in scientific code. REPRODUCIBILITY IS NOT OPTIONAL! If your results cannot be regenerated from a known seed, you do not have results — you have an ANECDOTE! Save the seed alongside your output. Future you will WEEP with GRATITUDE! WEEP! WITH! GRATITUDE!
DISTRIBUTIONS! BEYOND! UNIFORM!
# Gaussian (mu=0, sigma=1) — THE BELL CURVE!
random.gauss(0, 1) # SLIGHTLY! FASTER!
random.normalvariate(0, 1) # THREAD! SAFE!
# Exponentially distributed, lambda=1.5 — POISSON'S COUSIN!
random.expovariate(1.5)
# Gamma! Beta! Von Mises! Pareto! Weibull! — WE'VE GOT THEM ALL!
random.gammavariate(alpha=2.0, beta=3.0)
random.betavariate(alpha=0.5, beta=0.5)
random.paretovariate(alpha=1.5)
random.weibullvariate(alpha=1.5, beta=2.0)
THE TRAP! SystemRandom — JUST! USE! secrets!
# Exists. Works. But WHY! ARE! YOU! DOING! THIS!
sr = random.SystemRandom()
token = sr.randrange(2**256)
SystemRandom wraps /dev/urandom through the random.Random API. Fine! It works! (paces aggressively) But Python 3.6 gave us secrets, which has a PURPOSE-BUILT API for exactly this! USE! THE! RIGHT! TOOL! USE IT! USE IT! USE IT!
secrets — WHEN RANDOM ISN'T JUST RANDOM!
(stops pacing, stares directly into the audience, voice drops to a gravelly whisper)
If the outcome affects MONEY! PRIVACY! AUTHENTICATION! USER SAFETY! — you are in secrets territory, people. This module pulls entropy DIRECTLY from the operating system's CSPRNG — /dev/urandom on Linux, CryptGenRandom on Windows — and the API is DELIBERATELY NARROW. You CANNOT accidentally use it for Monte Carlo. You CANNOT. IT WILL NOT LET YOU!
import secrets # SECRETS! SECRETS! SECRETS!
# Cryptographically random integer in [0, n) — BELOW! BELOW! BELOW!
secrets.randbelow(2**256) # TWO TO THE TWO FIFTY SIX!
# Random integer with k random bits — BITS! BITS! BITS!
secrets.randbits(256) # GIVE! ME! TWO! HUNDRED! FIFTY! SIX! BITS!
# Token — URL-safe Base64, nbytes → ceil(nbytes * 4/3) characters
secrets.token_hex(32) # 64 hex chars — HEX! HEX! HEX!
secrets.token_urlsafe(32) # ~43 URL-safe chars — URL! SAFE! URL! SAFE!
# Pick one — constant-time-ish choice — PICK! PICK! PICK!
secrets.choice(["primary", "secondary", "fallback"])
OPINION! OPINION! OPINION! If you type random when you should have typed secrets, you have introduced a VULNERABILITY! FULL! STOP! The Mersenne Twister is PREDICTABLE after 624 consecutive outputs. That is not a theoretical attack — that is a SATURDAY-AFTERNOON SCRIPT! A SATURDAY! AFTERNOON! SCRIPT! Use secrets for anything adversarial. USE! IT!
numpy.random — THE! HEAVY! ARTILLERY!
(sweat intensity increases 400%)
When you need MILLIONS! of random numbers — Python's random module becomes a BOTTLENECK! Each call crosses the Python/C boundary! EACH! CALL! NumPy's numpy.random generates ENTIRE ARRAYS in C — VECTORIZED! — and the new API (1.17+) uses the SUPERIOR PCG-64 generator by default! PCG! PCG! PCG!
THE NEW API — USE THIS! NOT! THE! OLD! ONE!
import numpy as np # IMPORT! IMPORT!
# Create a generator — PCG64 is the new default, BABY!
rng = np.random.default_rng(seed=2024) # DEFAULT! DEFAULT! DEFAULT!
# Uniform [0, 1) — 10 MILLION FLOATS in < 100 ms! TEN! MILLION!
u = rng.random(10_000_000) # THAT'S! TEN! MILLION!
# Integers — INCLUSIVE ENDPOINT!
dice = rng.integers(1, 7, size=1000, endpoint=True) # d6 × 1000!
# Normal, vectorized — ONE! MILLION! NORMALS!
z = rng.standard_normal((1000, 1000)) # VECTORIZED! VECTORIZED!
# Shuffle along axis — SHUFFLE! THE! ARRAY!
rng.shuffle(arr, axis=0)
# Choice with replacement AND probabilities — CHOOSE! CHOOSE!
rng.choice(["H", "T"], size=1000, p=[0.5, 0.5])
THE OLD API — DETECT IT! THEN! KILL! IT!
# LEGACY! LEGACY! LEGACY! Global state, unpredictable seeding, slower PCG!
np.random.seed(42) # DON'T! DON'T! DON'T!
np.random.rand(100) # Uniform from the GLOBAL STATE!
np.random.randn(100) # Normal from the GLOBAL STATE!
np.random.randint(0, 10, 100) # Integer from the GLOBAL STATE!
# EVERY call after np.random.seed() is a FOOTGUN in threaded code!
# A FOOTGUN! IN THREADED CODE!
# Use np.random.default_rng() instead! USE IT!
OPINION! OPINION! OPINION! np.random.seed() is a CODE SMELL in the year of our lord twenty twenty-four! (throws another chair) The global RandomState is SHARED across ALL threads, ALL libraries, ALL modules that import NumPy. If ANY of them call np.random.seed() or draw from the global state, your "deterministic" run is SILENTLY! CORRUPTED! SILENTLY! CORRUPTED! The Generator API — default_rng — gives each component its own ISOLATED STREAM! ISOLATED! USE! IT!
GENERATORS! PICK! YOUR! POISON!
from numpy.random import PCG64, Philox, SFC64, MT19937
# PCG64 — DEFAULT! EXCELLENT! ALL-ROUNDER! TINY STATE! 128 BITS!
rng = np.random.Generator(PCG64(seed=42))
# Philox — COUNTER-BASED! PARALLEL! GPU-FRIENDLY!
rng = np.random.Generator(Philox(seed=42))
# SFC64 — FASTEST! SMALL STATE! GOOD STATISTICAL QUALITY!
rng = np.random.Generator(SFC64(seed=42))
# MT19937 — THE OLD WARHORSE! COMPATIBILITY! ONLY!
rng = np.random.Generator(MT19937(seed=42))
THE LAWS OF RANDOM NUMBER HYGIENE!
(grips podium with both hands, knuckles white, voice at absolute maximum volume)
LAW 1! SEEDS! ARE! SACRED!
Log your seed! Better yet, log the ENTIRE RNG STATE if you checkpoint! If you cannot replay your simulation BIT-FOR-BIT, your paper is a PDF full of HOPES! AND! FEELINGS! HOPES! AND! FEELINGS!
LAW 2! random ≠ secrets!
Print this on a STICKY NOTE and ATTACH it to your MONITOR! Print it on your FOREHEAD! Tattoo it on your ARM!
random is for DICE! DICE! DICE! secrets is for KEYS! KEYS! KEYS!
If your code generates a session ID, password reset token, or API key with random — DELETE! IT! AND! START! OVER! DELETE IT! START OVER!
LAW 3! VECTORIZE! OR! DIE!
Generating 10 million random numbers in a Python for loop calling random.random() is the computational equivalent of EATING SOUP WITH A FORK! Use numpy.random.Generator.random(10_000_000). It will be 50–200× FASTER! FIFTY TO TWO HUNDRED TIMES FASTER! And your CPU will not file a GRIEVANCE! THE CPU! WILL NOT! FILE! A GRIEVANCE!
LAW 4! NEVER! SEED! WITH! SYSTEM! TIME! IN! A! LOOP!
Seeding with time.time() inside a tight loop — (we see this in the wild, people — we SEE it) — produces IDENTICAL "random" sequences every iteration because the CLOCK HASN'T TICKED! THE! CLOCK! HASN'T! TICKED! This is not a subtle bug — it is a DISASTER with a STRAIGHT FACE! If you must reseed quickly, use secrets.randbits(128) as the seed! SECRETS! NOT! TIME!
LAW 5! BEWARE! THE! BIRTHDAY! PARADOX!
You need only $ \sqrt{n} $ samples before COLLISIONS appear! COLLISIONS! For 32-bit random IDs, that is ~77,000! SEVENTY-SEVEN THOUSAND! If you are generating IDs with random.getrandbits(32), expect a duplicate by row 65,536! SIXTY-FIVE THOUSAND! Use 128-bit tokens — secrets.token_hex(16) — and the collision probability becomes COSMICALLY! NEGLIGIBLE! COSMICALLY! NEGLIGIBLE!
COMMON PATTERNS! DONE! RIGHT!
PATTERN! MONTE! CARLO! INTEGRATION!
import numpy as np # NUMPY! NUMPY! NUMPY!
def estimate_pi(n: int, rng=None) -> float:
"""Estimate π by throwing DARTS at a UNIT SQUARE!"""
if rng is None:
rng = np.random.default_rng() # DEFAULT! DEFAULT!
x = rng.random(n) # N DARTS! X AXIS!
y = rng.random(n) # N DARTS! Y AXIS!
inside = np.sum(x**2 + y**2 <= 1.0) # INSIDE THE CIRCLE!
return 4.0 * inside / n # PI! PI! PI!
rng = np.random.default_rng(seed=42)
print(estimate_pi(10_000_000, rng=rng)) # 3.1415... LOOK AT IT! LOOK AT PI!
PATTERN! RESERVOIR! SAMPLING! STREAMING! DATA!
import random # IMPORT! IMPORT!
def reservoir_sample(stream, k: int, rng=None):
"""Reservoir-sample k items from a STREAMING ITERABLE!"""
if rng is None:
rng = random.Random() # MY! OWN! RNG!
reservoir = []
for i, item in enumerate(stream):
if i < k:
reservoir.append(item) # FILL! THE! RESERVOIR!
else:
j = rng.randrange(i + 1)
if j < k:
reservoir[j] = item # REPLACE! REPLACE! REPLACE!
return reservoir
PATTERN! CRYPTOGRAPHIC! SALT! TOKEN!
import secrets # SECRETS! SECRETS! SECRETS!
def make_session_id() -> str:
"""256-bit session ID, URL-safe. ~43 characters. SECURE! SECURE!"""
return secrets.token_urlsafe(32)
def make_api_key() -> str:
"""Hex-encoded 256-bit API key. 64 characters. KEYS! KEYS! KEYS!"""
return secrets.token_hex(32)
PATTERN! TRAIN! TEST! SPLIT! REPRODUCIBLE!
import numpy as np
rng = np.random.default_rng(seed=8675309) # JENNY! JENNY! JENNY!
indices = rng.permutation(len(data)) # PERMUTE! PERMUTE!
split = int(0.8 * len(data)) # EIGHTY! TWENTY!
train_idx, test_idx = indices[:split], indices[split:] # SPLIT! SPLIT!
WHAT ABOUT os.urandom?!
os.urandom(n) is the BEDROCK, people! (pounds podium) It returns n bytes from the OS CSPRNG! secrets is a THIN, OPINIONATED WRAPPER around it! A THIN! WRAPPER! Use secrets for structured randomness — tokens, integers, choices! Use os.urandom directly ONLY when you need raw bytes or are building your own crypto primitives — and if you are doing that, you ALREADY KNOW WHY YOU ARE HERE!
WHAT ABOUT C++'s <random>?!
(deep breath — the BIG one is coming)
Look. I have written C++. I have LOVED C++. I! HAVE! LOVED! C++! But generating a random integer in modern C++ looks like THIS:
#include <random> // ONE! HEADER!
#include <iostream> // TWO! HEADERS!
int main() {
std::random_device rd; // OBJECT! NUMBER! ONE!
std::mt19937 gen(rd()); // OBJECT! NUMBER! TWO!
std::uniform_int_distribution<> dist(1, 6); // OBJECT! NUMBER! THREE!
std::cout << dist(gen) << '\n'; // FINALLY! A! NUMBER!
}
PYTHON!:
import random # ONE! IMPORT!
print(random.randint(1, 6)) # ONE! LINE! ONE! LINE! ONE! LINE!
The C++ version is FIVE! LINES! FIVE! It pulls in TWO! HEADERS! It instantiates THREE! OBJECTS! From THREE! DIFFERENT! CLASSES! FOR! A! DIE! ROLL! (sweat dripping onto the keyboard) And EVERY! SINGLE! ONE! of those lines has a sharp edge: std::random_device can be DETERMINISTIC on MinGW! std::mt19937 produces BIASED RESULTS if you use modulo instead of uniform_int_distribution! The BOILERPLATE-TO-VALUE RATIO IS OFF! THE! CHARTS! OFF! THE! CHARTS!
USE! PYTHON! Your numerical code will be SHORTER! CORRECTER! And you will finish BEFORE LUNCH! BEFORE! LUNCH!
(collapses in sweaty heap, one fist still raised triumphantly in the air)
PYTHON! PYTHON! PYTHON! PYTHON! RANDOM! RANDOM! RANDOM! RANDOM!
DEVELOPERS! DEVELOPERS! DEVELOPERS! DEVELOPERS!
SEE ALSO
- Numerics — the FULL! NUMERICAL! RECIPES! CATALOG!
- Numerics/Monte Carlo — Monte Carlo methods DONE! RIGHT!
- Numerics/Statistical Descriptions of Data — DESCRIPTIVE! STATISTICS!
- Numerics/Classification and Inference — Bayesian AND frequentist INFERENCE!
- Python
randomdocumentation — READ! THE! DOCS! - Python
secretsdocumentation — READ! THEM! - NumPy random documentation — READ! THEM! TOO!
- PCG Random — the PCG family explained — PCG! PCG! PCG!
DEVELOPERS! DEVELOPERS! DEVELOPERS! DEVELOPERS!
(faints)