Random Numbers

Exploring the quirks of C's venerable rand() function — and why seeding with time(NULL) can produce "random" numbers that are anything but.

If you're looking for a proper, Python-centric treatment of random numbers, head over to Numerics/Random Numbers — that's the grown-up table. This page is about what happens when you reach for the C standard library's random facilities without thinking too hard.

The Program

Here's a minimal C++ program that generates five "random" numbers:

#include <stdlib.h>     // srand, rand
#include <time.h>       // time
#include <iostream>

using std::cout;
using std::endl;

int main()
{
    srand(time(NULL));  // seed with the current Unix timestamp
    for (int i = 0; i < 5; i++) {
        cout << rand() << endl;
    }
    return 0;
}

Yes, we're using the C headers (stdlib.h, time.h) instead of the C++ wrappers (cstdlib, ctime). For a quick-and-dirty demo this is fine — but in anything resembling production code you'd want <random> (we'll get to that).

Compilation

$ g++ myrand.cpp -o myrand

Output

Here's the program run four times, one second apart:

charles @ cronus [ 2016-06-21 - 15:00:12 - 41 ] /temp $ ./a.out
1575746672
805981500
1951708871
1705770619
2127589730

charles @ cronus [ 2016-06-21 - 15:00:13 - 42 ] /temp $ ./a.out
1575763479
1088456749
1426875297
543230630
1124215013

charles @ cronus [ 2016-06-21 - 15:00:14 - 43 ] /temp $ ./a.out
1575780286
1370931998
902041723
1528174288
120840296

charles @ cronus [ 2016-06-21 - 15:00:15 - 44 ] /temp $ ./a.out
1575797093
1653407247
377208149
365634299
1264949226

The First Number Is Suspiciously Predictable

Look at the first number from each run:

1575746672
1575763479
1575780286
1575797093

They all start with 15757... and they increase by roughly 16,800 each run. That's not a coincidence — it's literally the Unix timestamp marching forward one second at a time. In mid-2016, time(NULL) returns a value around 1.57 billion. Since we're feeding it directly into srand() as the seed, the first rand() output is heavily correlated with that seed value.

Here's the thing: rand() is a linear congruential generator (on glibc, anyway). Given similar seeds, the first few outputs will be similar. You're not getting randomness — you're getting a deterministic function of the current second. Anyone who knows approximately when you ran the program can guess your first "random" number with embarrassing precision.

I hope the people who design cryptosystems know about this! (They absolutely do — this is Crypto 101, and nobody's been using rand() for key material since approximately forever. But it's still a great cautionary tale.)

The One-Second Problem

time(NULL) only ticks once per second. If you run the program multiple times within the same second, every invocation gets the identical seed — and rand() is fully deterministic. Same seed, same sequence, every single time:

$ ./a.out && ./a.out && ./a.out && ./a.out && ./a.out
1584133365
27210049
2052760379
1418900798
1807295698

1584133365
27210049
2052760379
1418900798
1807295698

1584133365
27210049
2052760379
1418900798
1807295698

1584133365
27210049
2052760379
1418900798
1807295698

1584133365
27210049
2052760379
1418900798
1807295698

Five identical outputs because the shell launched them all in a fraction of a second. This is honestly kind of beautiful — a perfect demonstration of why timestamp seeding is a disaster for anything that actually needs unpredictability.

What You Should Actually Use

C++11 introduced the <random> header, and it's a massive upgrade over rand():

#include <random>
#include <iostream>

int main()
{
    std::random_device rd;                      // non-deterministic seed from /dev/urandom
    std::mt19937 gen(rd());                     // Mersenne Twister engine
    std::uniform_int_distribution<> dis(1, 6);  // fair dice roll, no modulo bias

    for (int i = 0; i < 5; i++) {
        std::cout << dis(gen) << std::endl;
    }
    return 0;
}

The big wins here:

std::random_device pulls entropy from the operating system (on Linux, it reads /dev/urandom). No more timestamp-as-seed nonsense.
Mersenne Twister (std::mt19937) is a well-studied PRNG with a period of 2¹⁹⁹³⁷−1 and good statistical properties. It passes most randomness test suites.
Distribution objects let you generate numbers in specific ranges without the modulo bias that rand() % N famously introduces.

rand() has been in the C standard library since 1989, quietly producing terrible random numbers in codebases everywhere. Studying its failure modes turns out to be a fantastic way to understand why randomness is genuinely hard — and why the language committee finally gave us proper tools in C++11.