/* * Uniform random floats: How to generate a double-precision * floating-point number in [0, 1] uniformly at random given a uniform * random source of bits. * * Copyright (c) 2014, Taylor R Campbell * * Verbatim copying and distribution of this entire article are * permitted worldwide, without royalty, in any medium, provided * this notice, and the copyright notice, are preserved. * * The naive approach of generating an integer in {0, 1, ..., 2^k - 1} * for some k and dividing by 2^k, as used by, e.g., Boost.Random and * GCC 4.8's implementation of C++11 uniform_real_distribution, gives a * nonuniform distribution: * * - If k is 64, a natural choice for the source of bits, then because * the set {0, 1, ..., 2^53 - 1} is represented exactly, whereas many * subsets of {2^53, 2^53 + 1, ..., 2^64 - 1} are rounded to common * floating-point numbers, outputs in [0, 2^-11) are underrepresented. * * - If k is 53, in an attempt to avoid nonuniformity due to rounding, * then numbers in (0, 2^-53) and 1 will never be output. These * outputs have very small, but nonnegligible, probability: the Bitcoin * network today randomly guesses solutions every ten minutes to * problems for which random guessing has much, much lower probability * of success, closer to 2^-64. * * What is the uniform' distribution we want, anyway? It is obviously * not the uniform discrete distribution on the finite set of * floating-point numbers in [0, 1] -- that would be silly. For our * uniform' distribution, we would like to imagine[*] drawing a real * number in [0, 1] uniformly at random, and then choosing the nearest * floating-point number to it. * * To formalize this, start with the uniform continuous distribution on * [0, 1] in the real numbers, whose measure is mu([a, b]) = b - a for * any [a, b] contained in [0, 1]. Next, let rho be the default * (round-to-nearest/ties-to-even) rounding map from real numbers to * floating-point numbers, so that, e.g., rho(0.50000000000000001) = * 0.5. * * Note that the preimage under rho of any floating-point number -- * that is, for a floating-point number x, the set of real numbers that * will be rounded to x, or { u \in [0, 1] : rho(u) = x } -- is an * interval. Its measure, mu(rho^-1(x)), is the measure of the set of * numbers in [0, 1] that will be rounded to x, and that is precisely * the probability we want for x in our uniform' distribution. * * Instead of drawing from real numbers in [0, 1] uniformly at random, * we can imagine drawing from the space of infinite sequences of bits * uniformly at random, say (0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, ...), and * interpreting the result as the fractional part of the binary * expansion of a real number in [0, 1]: * * 0*(1/2) + 0*(1/4) + 0*(1/8) + 1*(1/16) + 0*(1/32) + 1*(1/64) + ... * * Then if we round that number, we'll get a floating-point number with * the desired probability. But we can't just directly compute that * sum one term at a time from left to right: once we get to 53 * significant bits, further bits will be rounded away by addition, so * the result will be biased to be even', i.e. to have zero least * significant bit. And we obviously can't choose uniformly at random * from the space of infinite sequences of bits and round the result. * * We can, however, choose finite sequences of bits uniformly at * random, and, except for a detail about ties, after a certain * constant number of bits, no matter what further bits we choose they * won't change what floating-point number we get by rounding. So we * can choose a sufficiently large finite sequence of bits and round * that, knowing that the result of rounding would not change no matter * how many extra bits we add. * * The detail about ties is that if we simply choose, say, 1088 bits -- * the least multiple of 64 greater than the exponent of the smallest * possible (subnormal) floating-point number, 2^-1074 -- and round it * in the default round-to-nearest/ties-to-even mode, the result would * be biased to be even. * * The reason is that the probability of seeing a tie in any finite * sequence of bits is nonzero, but but real ties occur only in sets of * measure zero -- they happen only when *every* remaining bit in the * infinite sequence is also zero. To avoid this, before rounding, we * can simulate a sticky bit by setting the least significant bit of * the finite sequence we choose, in order to prevent rounding what * merely looks approximately like a tie to even. * * One little, perhaps counterintuitive, detail remains: the boundary * values, 0 and 1, have half the probability you might naively expect, * because we exclude from consideration the half of the interval * rho^-1(0) below 0 and the half of the interval rho^-1(1) above 1. * * Some PRNG libraries provide variations on the naive algorithms that * omit one or both boundary values, in order to allow, e.g., computing * log(x) or log(1 - x). For example, instead of drawing from {0, 1, * ..., 2^53 - 1} and dividing by 2^53, some * * - divide by 2^53 - 1 instead, to allow both boundary points; * - draw from {1, 2, ..., 2^53} instead, to omit 0 but allow 1; or * - draw from {-2^52, -2^52 + 1, ..., 2^52 - 2, 2^52 - 1} (i.e., the * signed rather than unsigned 53-bit integers) and then add 1/2 before * dividing, to omit both boundary points. * * These algorithms still have the biases described above, however. * Given a uniform distribution on [0, 1], you can always turn it into * a uniform distribution on (0, 1) or (0, 1] or [0, 1) by rejection * sampling if 0 or 1 is a problem. * * The code below demonstrates all three algorithms with brief notes: * - random_real_64, which chooses and normalizes a 64-bit integer; * - random_real_53, which chooses and normalizes a 53-bit integer; and * - random_real, which chooses and rounds a binary expansion. * They are written in terms of a uniform random source of 64-bit * integers spelled random64. This is usually more convenient than a * source of fair coin flips. * * WARNING: I have only proved the code correct; I have not tested it. * * Generating 64-bit integers uniformly at random is left as an * exercise for the crypto-minded reader. * * Thanks to Steve Canon and Alex Barnett for explaining the binary * expansion approach to me and talking the problem over. * * [*] For the pedantic who may stumble upon this before reading ahead, * we cannot, of course, assign a meaningful probability to a choice of * real number in [0, 1]. What we can do, however, is imagine that it * were meaningful, and then formalize it in the next sentence with the * correct sense of measure. */ #include #include uint64_t random64(void); /* * random_real_64: Pick an integer in {0, 1, ..., 2^64 - 1} uniformly * at random, convert it to double, and divide it by 2^64. Values in * [2^-11, 1] are overrepresented, small exponents have low precision, * and exponents below -64 are not possible. */ double random_real_64(void) { return ldexp((double)random64(), -64); } /* * random_real_53: Pick an integer in {0, 1, ..., 2^53 - 1} uniformly * at random, convert it to double, and divide it by 2^53. Many * possible outputs are not represented: 2^-54, 1, &c. There are a * little under 2^62 floating-point values in [0, 1], but only 2^53 * possible outputs here. */ double random_real_53(void) { return ldexp((double)(random64() & ((1ULL << 53) - 1)), -53); } #define clz64 __builtin_clzll /* XXX GCCism */ /* * random_real: Generate a stream of bits uniformly at random and * interpret it as the fractional part of the binary expansion of a * number in [0, 1], 0.00001010011111010100...; then round it. */ double random_real(void) { int exponent = -64; uint64_t significand; unsigned shift; /* * Read zeros into the exponent until we hit a one; the rest * will go into the significand. */ while (__predict_false((significand = random64()) == 0)) { exponent -= 64; /* * If the exponent falls below -1074 = emin + 1 - p, * the exponent of the smallest subnormal, we are * guaranteed the result will be rounded to zero. This * case is so unlikely it will happen in realistic * terms only if random64 is broken. */ if (__predict_false(exponent < -1074)) return 0; } /* * There is a 1 somewhere in significand, not necessarily in * the most significant position. If there are leading zeros, * shift them into the exponent and refill the less-significant * bits of the significand. Can't predict one way or another * whether there are leading zeros: there's a fifty-fifty * chance, if random64 is uniformly distributed. */ shift = clz64(significand); if (shift != 0) { exponent -= shift; significand <<= shift; significand |= (random64() >> (64 - shift)); } /* * Set the sticky bit, since there is almost surely another 1 * in the bit stream. Otherwise, we might round what looks * like a tie to even when, almost surely, were we to look * further in the bit stream, there would be a 1 breaking the * tie. */ significand |= 1; /* * Finally, convert to double (rounding) and scale by * 2^exponent. */ return ldexp((double)significand, exponent); }