Random Numbers
From charlesreid1
Exploring the quirks of C's venerable rand() function — and why seeding with time(NULL) can produce "random" numbers that are anything but.
If you're looking for a proper, Python-centric treatment of random numbers, head over to Numerics/Random Numbers — that's the grown-up table. This page is about what happens when you reach for the C standard library's random facilities without thinking too hard.
The Program
Here's a minimal C++ program that generates five "random" numbers:
#include <stdlib.h> // srand, rand
#include <time.h> // time
#include <iostream>
using std::cout;
using std::endl;
int main()
{
srand(time(NULL)); // seed with the current Unix timestamp
for (int i = 0; i < 5; i++) {
cout << rand() << endl;
}
return 0;
}
Yes, we're using the C headers (stdlib.h, time.h) instead of the C++ wrappers (cstdlib, ctime). For a quick-and-dirty demo this is fine — but in anything resembling production code you'd want <random> (we'll get to that).
Compilation
$ g++ myrand.cpp -o myrand
Output
Here's the program run four times, one second apart:
charles @ cronus [ 2016-06-21 - 15:00:12 - 41 ] /temp $ ./a.out 1575746672 805981500 1951708871 1705770619 2127589730 charles @ cronus [ 2016-06-21 - 15:00:13 - 42 ] /temp $ ./a.out 1575763479 1088456749 1426875297 543230630 1124215013 charles @ cronus [ 2016-06-21 - 15:00:14 - 43 ] /temp $ ./a.out 1575780286 1370931998 902041723 1528174288 120840296 charles @ cronus [ 2016-06-21 - 15:00:15 - 44 ] /temp $ ./a.out 1575797093 1653407247 377208149 365634299 1264949226
The First Number Is Suspiciously Predictable
Look at the first number from each run:
- 1575746672
- 1575763479
- 1575780286
- 1575797093
They all start with 15757... and they increase by roughly 16,800 each run. That's not a coincidence — it's literally the Unix timestamp marching forward one second at a time. In mid-2016, time(NULL) returns a value around 1.57 billion. Since we're feeding it directly into srand() as the seed, the first rand() output is heavily correlated with that seed value.
Here's the thing: rand() is a linear congruential generator (on glibc, anyway). Given similar seeds, the first few outputs will be similar. You're not getting randomness — you're getting a deterministic function of the current second. Anyone who knows approximately when you ran the program can guess your first "random" number with embarrassing precision.
I hope the people who design cryptosystems know about this! (They absolutely do — this is Crypto 101, and nobody's been using rand() for key material since approximately forever. But it's still a great cautionary tale.)
The One-Second Problem
time(NULL) only ticks once per second. If you run the program multiple times within the same second, every invocation gets the identical seed — and rand() is fully deterministic. Same seed, same sequence, every single time:
$ ./a.out && ./a.out && ./a.out && ./a.out && ./a.out 1584133365 27210049 2052760379 1418900798 1807295698 1584133365 27210049 2052760379 1418900798 1807295698 1584133365 27210049 2052760379 1418900798 1807295698 1584133365 27210049 2052760379 1418900798 1807295698 1584133365 27210049 2052760379 1418900798 1807295698
Five identical outputs because the shell launched them all in a fraction of a second. This is honestly kind of beautiful — a perfect demonstration of why timestamp seeding is a disaster for anything that actually needs unpredictability.
What You Should Actually Use
C++11 introduced the <random> header, and it's a massive upgrade over rand():
#include <random>
#include <iostream>
int main()
{
std::random_device rd; // non-deterministic seed from /dev/urandom
std::mt19937 gen(rd()); // Mersenne Twister engine
std::uniform_int_distribution<> dis(1, 6); // fair dice roll, no modulo bias
for (int i = 0; i < 5; i++) {
std::cout << dis(gen) << std::endl;
}
return 0;
}
The big wins here:
std::random_devicepulls entropy from the operating system (on Linux, it reads/dev/urandom). No more timestamp-as-seed nonsense.- Mersenne Twister (
std::mt19937) is a well-studied PRNG with a period of 219937−1 and good statistical properties. It passes most randomness test suites. - Distribution objects let you generate numbers in specific ranges without the modulo bias that
rand() % Nfamously introduces.
rand() has been in the C standard library since 1989, quietly producing terrible random numbers in codebases everywhere. Studying its failure modes turns out to be a fantastic way to understand why randomness is genuinely hard — and why the language committee finally gave us proper tools in C++11.
See Also
- Numerics/Random Numbers — the Python way, with NumPy and all the good stuff
- cppreference: Random number generation