Better Random Numbers in PHP using /dev/urandom

Posted by on July 19, 2012

The design of PHP’s basic random number generators rand and it’s newer variant mt_rand is based off the C Standard Library. For better or worse, both use a single global state and this can be reset using stand (or mt_srand). This means anyone (a developer, a third party module, a library) could set the state to a fixed value and every random number following will be the same for every request. Sometimes this is the desired behavior but this can also have disastrous consequences. For instance, everyone’s password reset code could end up being the same.

Recently, Argyros and Kiayias in I Forgot Your Password: Randomness Attacks Against PHP Applications suggests there might be more fundamental problems in how PHP constructs the state of the random number generator. Just by seeing the output of a few calls to rand or mt_rand, one can predict the next output. With this, and certain password reset implementations, an attacker could perform account takeover. (This paper is also going to be presented on July 25 at Black Hat USA).

Quite some time ago, Etsy switched over to a different way of generating random numbers by using /dev/urandom that prevents both issues. /dev/urandom is a special psuedo-file on unix-like operating systems that generates “mostly random” bytes and is non-blocking. /dev/random (with no “u“) is for truly cryptographic applications such as key generation and is blocking. Once you exhaust it’s supply of randomness it blocks until it distills new randomness from the environment. Therefore, you don’t want to use /dev/random in your web application. To see why, connect to a (non-production!) remote machine and type in “cat /dev/random > /dev/null“, and the in another window try to log in. You won’t be able to, since SSH can’t read from /dev/random and therefore can’t complete the connection.

A pedagogical replacement of rand, mt_rand with /dev/urandom using the mcrypt module might be:

// equiv to rand, mt_rand
// returns int in *closed* interval [$min,$max]
function devurandom_rand($min = 0, $max = 0x7FFFFFFF) {
    $diff = $max - $min;
    if ($diff < 0 || $diff > 0x7FFFFFFF) {
	throw new RuntimeException("Bad range");
    }
    $bytes = mcrypt_create_iv(4, MCRYPT_DEV_URANDOM);
    if ($bytes === false || strlen($bytes) != 4) {
        throw new RuntimeException("Unable to get 4 bytes");
    }
    $ary = unpack("Nint", $bytes);
    $val = $ary['int'] & 0x7FFFFFFF;   // 32-bit safe
    $fp = (float) $val / 2147483647.0; // convert to [0,1]
    return round($fp * $diff) + $min;
}

A long time ago, Etsy didn’t even have mcrypt installed and so we read directly from /dev/urandom using open and fread (see also stream_set_read_buffer).

Using /dev/urandom is quite a bit slower than using the native php functions. However, as a percentage of total page load time, it’s effectively free and unnoticeable.  Do we graph this? Yes we do!  Here’s a days worth of random:

Note that the above code converting bytes to an integer will demonstrate a slight bias with very large ranges, so we can’t use for it with Etsy’s monte-carlo long-range simulation forecasting hand-made supercomputer but for all the other (non-cryptographic) web applications likely to be. For other algorithms and details on this topic, the main reference is Knuth’s Art of Computer Programming: Seminumerical Algorithms. A more modern treatment can be found in any of the Numerical Recipes books. The Java source code for java.util.Random is also a good reference. Enjoy!

Posted by on July 19, 2012
Category: security

6 Comments

[…] an article on Code As Craft, Etsy’s code blog about why it really is only Probably Random and you should never assume it […]

What kind of applications do you have for random numbers (other than password reset tokens)? For password reset tokens do you use anything else to ensure uniqueness?

It’s its: “Once you exhaust it’s supply of randomness […]”

Just to expand on what code is provided here in the article – I took the time to implement 64-bit support.

function randomInt($min = 0, $max = PHP_INT_MAX)
{
$diff = $max – $min;
if ($diff PHP_INT_MAX) {
throw new RuntimeException(‘Bad Range’);
}
$fh = fopen(‘/dev/urandom’, ‘r’);
stream_set_read_buffer($fh, PHP_INT_SIZE);
$bytes = fread($fh, PHP_INT_SIZE );
if ($bytes === false || strlen($bytes) != PHP_INT_SIZE ) {
throw new RuntimeException(“Unable to get ” . PHP_INT_SIZE . ” bytes”);
}
fclose($fh); // Closing handle.
if (PHP_INT_SIZE == 8) { // 64-bit versions
list($higher, $lower) = array_values(unpack(‘N2’, $bytes));
$value = $higher << 32 | $lower;
}
else { // 32-bit versions
list($value) = array_values(unpack('Nint', $bytes));

}
$val = $value & PHP_INT_MAX;
$fp = (float)$val / PHP_INT_MAX; // convert to [0,1]

return (int)(round($fp * $diff) + $min);
}

Hi Nick. Thanks for this article. Helped me a lot while writing an better PHP rng class for generating random numbers. No plain PHP implementation (cause of performance) but an abstraction like in your example.

btw: You have a typo at the beginning of your article: stand (should be srand) 🙂

Oh, for shame guys. This RNG has a pretty bad bias, and it all boils down to how you boil down your 32-bit urandom input into a custom sized output:

return round($fp * $diff) + $min;

This guarantees that both min and max will be output *exactly half* as often as any other number in between them.

Perhaps with very big numbers this bias gets lost in the shuffle, but if you wanted to emulate say a d6 your snake-eyes would come up literally 4 times less frequently than they should.

Here is my suggestion for (max)32 bit urandom sampling:

function myrand($min = 0, $max = 0x7FFFFFFF)
{
$diff = $max – $min + 1;
if ($diff 0x7FFFFFFF)
{ throw new RuntimeException(“Bad range”); }

$cieling = floor(0x7FFFFFFF/$diff)*$diff; // The largest *multiple* of diff less than our sample

do {
$bytes = mcrypt_create_iv(4, MCRYPT_DEV_URANDOM);
if ($bytes === false || strlen($bytes) != 4)
{ throw new RuntimeException(“Unable to get 4 bytes”); }
$ary = unpack(“Nint”, $bytes);
$val = $ary[‘int’] & 0x7FFFFFFF; // 32-bit safe
} while($val>$cieling);
// In the unlikely case our sample is bigger than largest multiple,
// just do over until it’s not any more. Perfectly even sampling in
// our 0<output<diff domain is mathematically impossible unless
// the total number of *valid* inputs is an exact multiple of diff.

return $val % $diff + $min; // Modulo for President, 2016! 😀
}