Code repair 1: the problem with rand()

Many people have troubles when it comes to generating random numbers in their projects. Here's something inspired by a code snippet similar to something I saw over on stackoverflow that highlights some fairly typical issues that people face with random number generation:

The specification:

Generate a sequence comprised of uniformly distributed random integers in the range $[0,9]$

The code:

#include <iostream>
#include <ctime>
#include <cstdlib>
using namespace std;

int main()
{
    bool x = true;
    while(x == true)
    {
        int num = 1;
        srand(time(NULL));
        num = rand();
        cout<<num%10<<endl;
    }
}

The problem:

The code is supposed to produce a uniformly distributed integer in the range 0-9. Currently the code produces bad output that's clearly not random:

6666666666777777777700000000003333333333

There's always sequences of repeated numbers for all runs, the sequences are usually the same length.

Making the improvements

Basic improvements

As with all of these things a good starting point is improving the "low hanging fruit". This means fixing the relatively easy to fix problems like not initializing variables, removing unneeded code and other stylistic issues. Compiling with -Wall and -Wextra can help find many of these problems and should be something you try when you are debugging if you aren't using these flags already. These are the things that don't take much effort or imagination but make the code better to read:

#include <iostream>
#include <ctime>
using namespace std;

int main()
{
    while( true )
    {
        srand(time(0));
        cout << rand()%10 << endl;
    }
}

We removed the completely unnecessary variable x and removed the NULL parameter to time(). This also allows us to remove the cstdlib header. num currently isn't being used either for anything other than a placeholder so we can get rid of that too.

The code still doesn't produce the right output because we haven't changed the logic but it's a bit nicer now.

Debugging the logic

The main issue here is the that random number generator is getting seeded in the loop that its being used in. It is almost always a red flag when you see this. Random number generators are usually meant to be created only once then used from there on in.

Right now it is seeding it in the loop and using time(0) just means the seed changes once per second. Because running a random number generator from the same seed produces the same sequence we will see the same number produced until time(0) changes which will be once per second. This gives you the bad output described earlier.

So to fix this we want to put the seed, srand(), outside of the loop:

int main()
{
    srand(time(0));
    while(true)
    {
        cout << rand()%10 << endl;
    }
}

Now the output is looking a lot better.

16376652915209540254

But looks can be deceiving.....

Subtle problems

If this random number choice is just a quick-and-dirty way of introducing randomness that doesn't really need good statistical properties then just going with this simple approach might be fine.

However if you really need good statistical properties then you are going to have problems. For example I was programming a Monte Carlo simulator for some science research and this became a serious problem. The first time I came across this I wasted 2 days trying to find out why the RNG was producing poor quality results. Hopefully I can help spare someone that time and frustration by outlining some of the common problems:

If you really care about the random numbers generated you might want to use something other than rand(). The reason is that rand() has poor statistical properties for pseudo random number generation, as it is often implemented as a Linear congruential generator. If you need high quality randomness then you should prefer something else such as the new c++ random number generators http://www.cplusplus.com/reference/random/. In fact there's even a report on depreciating the old rand() to try to push people to use the newer c++ standard library random functions. I do like this idea, I'm sure I wouldn't have wasted those days debugging if I got a depreciation warning from my compiler.

In this particular case taking the modulus of rand() causes a few subtle problems:

num = rand();
cout <<num%10<<endl;

Even if rand() was perfectly uniformly distributed then taking the modulus you will get a non-uniform distribution as a result. This happens whenever the divisor of the maximum value returned by rand() is not zero. Here's a quick explanation:

say rand() returned values in the range of [0,25] then taking the modulus would do the following.

before after modulus
[0-9] [0-9]
[10-19] [0-9]
[20-25] [0-5]

You'll see that you are more likely to get [0-5] than [6-9] which means you now no longer have a uniform number being generated. Note that this small range is for educational purposes only, the maximum value of rand() is mandated by the standard to be at least 32767. However it illustrates an important point, the larger the maximum generated number the better.

This uniformity of distribution problem aside the modulus has the particularly insidious effect of decreasing the quality of the pseudo-randomness even further for some implementations.

You could solve the non-uniformity issue by discarding the last range of numbers. So for example using the 0-25 rand from before:

before after modulus
[0-9] [0-9]
[10-19] [0-9]
[20-25] discarded

Now we get a uniform distribution at the expense of some efficiency as we have to spend some clock cycles on discarding generated values. These RNG problems can be solved by writing your own code but it's a massive pain in the ass to get it right. More importantly the standard library writers have solved this problem for you.

Using std::uniform_int_distribution avoids many implementation issues so I would recommend changing your existing code to use the new random generation library. Adapting the old code would look like this:

#include <iostream>
#include <random>

int main()
{
    std::default_random_engine generator;
    generator.seed( time(0) );
    std::uniform_int_distribution<int> distribution(0,9);//note the min and max parameters are inclusive here
    while(true)
    {
        std::cout << distribution(generator) << std::endl;
    }
}

There's just one last problem left here (that I am aware of at least :) ) If you need these numbers to be generated for security purposes and the program is being used by multiple people then seeding the random number generator with time(0) might be a problem. If 2 people were using this routine at the same time then they would have the same stream of "random" numbers.

To deal with this issue we want to use a better way of seeding the RNG. The easiest approach is to use std::random_device as the standard dictates that this is supposed to be a cryptographically secure means of generating random numbers if the device can support it.

#include <iostream>
#include <random>

int main()
{
    std::random_device rd;
    std::default_random_engine generator;
    generator.seed( rd() );
    std::uniform_int_distribution<int> distribution(0,9);
    while(true)
    {
        std::cout << distribution(generator) << std::endl;
    }
}

By using the standard libraries this program solves almost all of the problems faced by the initial program.

One potential problem is that some systems just don't support std::random_device. Some implementations simply haven't implemented it and some platforms have inherent limitations preventing true random generation. Some implementations have just used some form of PRNG to do this (in contravention of the standard).

If it was crucial that you generated a random seed then perhaps you could check the entropy by doing:

std::random_device rd;
if(rd.entropy() > 0){
    generator.seed( rd() );
}else{
    //do something else
}

entropy() is supposed to return as estimate of the number of bits of entropy that are available. (Note that as of June 2014 the support for entropy() is overall very poor with the major compilers, so make sure you check the details of the implementation beforehand if this really matters)

If there's still issues with this last program please do contact me, if there's a bug or weakness I am unaware of I'd love to know more!

blogroll

social