|
|
Created:
9 years, 7 months ago by Jói Modified:
9 years, 6 months ago CC:
chromium-reviews, Paweł Hajdan Jr., brettw-cc_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionFix base::RandGenerator bug (it had non-uniform random distribution). Add test that would have caught bug. Also add a test to verify that our random generators are at least somewhat random.
BUG=84221
TEST=base_unittests --gtest_filter=RandUtilTest.*
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=87244
Patch Set 1 #
Total comments: 15
Patch Set 2 : Respond to review comments. #
Total comments: 3
Patch Set 3 : Fix issue pointed out by jar@ #
Total comments: 1
Patch Set 4 : Add "all bit values" test as suggested by jar@. #
Total comments: 2
Patch Set 5 : Switch to 3/4ths R for max. #Patch Set 6 : Switch to bitwise operations. Max to 1000. #Messages
Total messages: 19 (0 generated)
Ilya: Main reviewer. Brett: base/OWNERS approval. Cheers, Jói
http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc File base/rand_util_unittest.cc (right): http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:65: TEST(RandUtilTest, RandGeneratorIsUniform) { I'm not super excited about this test. It seems by definition flaky. You've tried to make it less flaky by running it a million times. How long does this take?
It takes 5 seconds to run if it never converges (on a not-so-fast Linux box). When the test is not broken, it very infrequently prints the status after 10K samples, as it's already converged. I could make it 10 million if you like. I think at the present levels, the test will pass something like 99.9999% of the time already. I could try to work out the math if you like :) Cheers, Joi On Fri, May 27, 2011 at 2:13 PM, <brettw@chromium.org> wrote: > > http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc > File base/rand_util_unittest.cc (right): > > http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... > base/rand_util_unittest.cc:65: TEST(RandUtilTest, > RandGeneratorIsUniform) { > I'm not super excited about this test. It seems by definition flaky. > You've tried to make it less flaky by running it a million times. How > long does this take? > > http://codereview.chromium.org/7080005/ >
I'm interested in Pawel's opinion on whether this test is useful. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc File base/rand_util_unittest.cc (right): http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:96: if (count >= kReportEveryNAttempts && count % kReportEveryNAttempts == 0) { I don't think tests should be printing stuff out. This typically just confuses people when they see it in the logs.
Well, the test is useful in that it shows the previous implementation did not generate a uniform random distribution. It generally passes in less than 40 ms. You shouldn't expect it to take multiple seconds, unless the implementation that it's testing is actually broken. Cheers, Joi On Fri, May 27, 2011 at 2:22 PM, <brettw@chromium.org> wrote: > I'm interested in Pawel's opinion on whether this test is useful. > > > http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc > File base/rand_util_unittest.cc (right): > > http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... > base/rand_util_unittest.cc:96: if (count >= kReportEveryNAttempts && > count % kReportEveryNAttempts == 0) { > I don't think tests should be printing stuff out. This typically just > confuses people when they see it in the logs. > > http://codereview.chromium.org/7080005/ >
Just as a further data point, running the test with --gtest_repeat=10000 took 2m23s wall-clock time, for an average run-time of 14 ms per cycle of the test, including GUnit overhead. There were no failures. I suspect given the current parameters for the maximum number of tries and the size of the expected interval (+/- 1% from the expected value), the test is already less likely to fail (unless the implementation gets broken) than any random test you take, because of gamma rays or memory glitches on your machine. I'd be happy to increase the timeout or widen the expected interval if you think it's important, but the average of a bunch of samples of a uniform random distribution converges very, very quickly, see e.g. http://en.wikipedia.org/wiki/Expected_value I'd hate to remove the test, since it verifies one of the most important pieces of behavior that a "number from a random interval" type of generator can have, i.e. that it is uniform. If it is ever flaky, I will of course widen the interval or do something else about it, but until it's failed at least once I don't think we should do anything. Cheers, Jói 2011/5/27 Jói Sigurðsson <joi@chromium.org> > Well, the test is useful in that it shows the previous implementation > did not generate a uniform random distribution. It generally passes > in less than 40 ms. You shouldn't expect it to take multiple seconds, > unless the implementation that it's testing is actually broken. > > Cheers, > Joi > > > On Fri, May 27, 2011 at 2:22 PM, <brettw@chromium.org> wrote: > > I'm interested in Pawel's opinion on whether this test is useful. > > > > > > http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc > > File base/rand_util_unittest.cc (right): > > > > > http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... > > base/rand_util_unittest.cc:96: if (count >= kReportEveryNAttempts && > > count % kReportEveryNAttempts == 0) { > > I don't think tests should be printing stuff out. This typically just > > confuses people when they see it in the logs. > > > > http://codereview.chromium.org/7080005/ > > >
+ Jim for comment on probabilistic tests, as I think he's contributed a few to the Chromium test suite. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc File base/rand_util_unittest.cc (right): http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:89: if (count > 1000) { nit: As long as you're making everything else a named constant, let's make 1000 a named constant too. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:96: if (count >= kReportEveryNAttempts && count % kReportEveryNAttempts == 0) { On 2011/05/27 18:22:20, brettw wrote: > I don't think tests should be printing stuff out. This typically just confuses > people when they see it in the logs. Yes, please remove these printf()'s, or find a way to log them only on failure.
Some comments, no need to wait for me. One note though: if more and more things start to depend on rand_util and require various probabilistic/statistical properties, it may be worth it to do some more in-depth evaluation of that code and fix all problems we can detect at once. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc File base/rand_util_unittest.cc (right): http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:65: TEST(RandUtilTest, RandGeneratorIsUniform) { On 2011/05/27 18:13:21, brettw wrote: > I'm not super excited about this test. It seems by definition flaky. You've > tried to make it less flaky by running it a million times. How long does this > take? Right, everything based on randomness is quite hard to test. I think it's important to keep this test fast, ideally no longer than few seconds, or even less. I wonder if it would be useful to do slightly different - rather than calling RandGenerator many times, feed it different values of |max| and |base::RandUint64()|, and see how it behaves. That should be deterministic, and still test the right thing... http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:78: const uint64 kAllowedVariance = kExpectedAverage / 100L; // 1% either way. nit: 1% seems quite strict, how about something more like 5% ? Initially rand_util was not intended for anything other than test code iirc, and it has no guarantees about strength or quality anyway. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:84: while (count < kMaxAttempts) { nit: Why not a for loop then? http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:96: if (count >= kReportEveryNAttempts && count % kReportEveryNAttempts == 0) { On 2011/05/27 18:22:20, brettw wrote: > I don't think tests should be printing stuff out. This typically just confuses > people when they see it in the logs. Yeah, and also it's better to use LOG or gtest macros instead of printf.
Thanks for the reviews guys. It seems several pieces of production code depend on RandGenerator and the derived function RandInt: http://codesearch.google.com/codesearch?as_q=base%3A%3ARandGenerator&btnG=Sea... http://codesearch.google.com/codesearch?as_q=base%3A%3ARandInt&btnG=Search+Co... One of them actually uses RandGenerator and duplicates the bug I am fixing; I will send a separate review to that code's owner. Cheers, Jói http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc File base/rand_util_unittest.cc (right): http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:65: TEST(RandUtilTest, RandGeneratorIsUniform) { On 2011/05/27 19:28:11, Paweł Hajdan Jr. wrote: > On 2011/05/27 18:13:21, brettw wrote: > > I'm not super excited about this test. It seems by definition flaky. You've > > tried to make it less flaky by running it a million times. How long does this > > take? > > Right, everything based on randomness is quite hard to test. I think it's > important to keep this test fast, ideally no longer than few seconds, or even > less. > > I wonder if it would be useful to do slightly different - rather than calling > RandGenerator many times, feed it different values of |max| and > |base::RandUint64()|, and see how it behaves. That should be deterministic, and > still test the right thing... That wouldn't verify that the random distribution is uniform. Asserting that it converges to close to the expected average value is the only good way to do that AFAIK. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:78: const uint64 kAllowedVariance = kExpectedAverage / 100L; // 1% either way. On 2011/05/27 19:28:11, Paweł Hajdan Jr. wrote: > nit: 1% seems quite strict, how about something more like 5% ? > > Initially rand_util was not intended for anything other than test code iirc, and > it has no guarantees about strength or quality anyway. The test already passes in an average of 14 ms wall-clock time on a somewhat slow machine, and in about 12 thousand test runs I've never seen it need more than 50K iterations (which takes a couple of hundred milliseconds or so on a reasonable machine) to converge. I've changed it to +/- 2% though, as the failure mode that existed before was much more skewed than that, but I'd prefer not to make the interval even wider in case future failure modes are more subtle. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:84: while (count < kMaxAttempts) { On 2011/05/27 19:28:11, Paweł Hajdan Jr. wrote: > nit: Why not a for loop then? Because I want to test the value of count after the loop. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:89: if (count > 1000) { On 2011/05/27 19:23:26, Ilya Sherman wrote: > nit: As long as you're making everything else a named constant, let's make 1000 > a named constant too. Done. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:96: if (count >= kReportEveryNAttempts && count % kReportEveryNAttempts == 0) { On 2011/05/27 18:22:20, brettw wrote: > I don't think tests should be printing stuff out. This typically just confuses > people when they see it in the logs. Done. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:96: if (count >= kReportEveryNAttempts && count % kReportEveryNAttempts == 0) { On 2011/05/27 19:23:26, Ilya Sherman wrote: > On 2011/05/27 18:22:20, brettw wrote: > > I don't think tests should be printing stuff out. This typically just confuses > > people when they see it in the logs. > > Yes, please remove these printf()'s, or find a way to log them only on failure. Done. http://codereview.chromium.org/7080005/diff/1/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:96: if (count >= kReportEveryNAttempts && count % kReportEveryNAttempts == 0) { On 2011/05/27 19:28:11, Paweł Hajdan Jr. wrote: > On 2011/05/27 18:22:20, brettw wrote: > > I don't think tests should be printing stuff out. This typically just confuses > > people when they see it in the logs. > > Yeah, and also it's better to use LOG or gtest macros instead of printf. Done.
LGTM
http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc File base/rand_util.cc (right): http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc#newcode59 base/rand_util.cc:59: } while (value > max_acceptable_value); Very cute.... but I think there is a typo, and this should read: while(value >= max_acceptable_value) Am I confused?? I think this currently gives a higher probability of returning 0 than any other number in the range.
http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc File base/rand_util.cc (right): http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc#newcode59 base/rand_util.cc:59: } while (value > max_acceptable_value); On 2011/05/27 20:25:14, jar wrote: > Very cute.... but I think there is a typo, and this should read: > > while(value >= max_acceptable_value) > > Am I confused?? > > I think this currently gives a higher probability of returning 0 than any other > number in the range. You're right, thanks for catching that. Fixed and uploaded a new version.
Note: I don't think I wrote the original code... and I'm not blocking the change or being defensive... but I am curious. LGTM with the additional change in the test below. If you really want this class of test of rand, please choose the optimal ratio (or tell me why your ratio is optimal.. see comment in test). If the bias of randomness is critical to our application, we may perchance want to add some tests for the quality of the rand on the various platforms (e.g., check that at least each bit in a rand assumes both values sometimes... and maybe that looking at only one bit at a time, that it is reasonably random). I wouldn't want to make this a doctoral study, or waste a ton of test cycles doing this, but it might be worth having some tests of at least some simple things. (example: with maximal modulus, for each single bit, it is possible to get a zero and to get a one). If you're going to add tests to watch out for this you may as well write a few "fast" tests (since you suggest it will impact your use) and be done with it while you're hacking in this area. (...and I can believe on some platforms this will be broken big time, such as by building a wimpy rand from a clock), http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc File base/rand_util.cc (right): http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc#newcode59 base/rand_util.cc:59: } while (value > max_acceptable_value); OK... With that change I'm convinced it is "right," but when I've seen this code historically in many codebases, (despite being aware of the bias), I never thought it was significant (with uint64 max being a whopping 2^64). The fact that your first patch introduced this class of bias (due to a typo) and was virtually undetectable partially substantiates my wonder. For my edification, can you cite how this bias was burning you in your code? For example, when max is 3, the bias in randomness amounts to roughly 1 in 2^62. I strongly suspect the underlying rand to have biases of that order. I'd expect you'd need to perform a LOT of experiments before you could detect that bias. You tested the other extreme, where max is very close to 2^64. (see interesting comment in the test). When max is that large, each sample comes from a GIANT range. As a result, any small number of samples will come with a large bias because we don't come any where near to getting enough samples to prove the range of answers are all even possible :-/. What is the example where this was harming you in a measurable way? Again, the change is IMO "right." rand is a real slow function usually, so I don't like to see too many extra calls... but the likelihood of such extras are small for most all users (who stay away from a giant max, that is not nearly a power of 2). On 2011/05/27 23:18:01, Jói wrote: > On 2011/05/27 20:25:14, jar wrote: > > Very cute.... but I think there is a typo, and this should read: > > > > while(value >= max_acceptable_value) > > > > Am I confused?? > > > > I think this currently gives a higher probability of returning 0 than any > other > > number in the range. > > You're right, thanks for catching that. Fixed and uploaded a new version. http://codereview.chromium.org/7080005/diff/4006/base/rand_util_unittest.cc File base/rand_util_unittest.cc (right): http://codereview.chromium.org/7080005/diff/4006/base/rand_util_unittest.cc#n... base/rand_util_unittest.cc:73: // that is 2/3rds of the way to MAX_UINT64, in which case the bottom You got me to wondering about this ratio, and it wasn't clear that 2/3 was optimally problematic. I applied calculus (which I've only done about 4 times in my life, after gettin' out'a school), and concluded that the maximum delta between the broken-code-mean and the correct (range over 2 mean) is actually achieved with a ratio R of: 3/4 (If I'm wrong, please correct me! I'm notorious for clever ideas... and typos in my arithmetic!). As always, after grinding through the calculus, there is an easy way to see this result (that the delta in the means is maximized with R=3/4) We only consider R in the range of 1/2 to 1, so that we never have to consider cases where the rand() values mapped more than twice into any return value. The difference function is then a quadratic in R, where R is the ratio. You can pretty much (in retrospect) guess that since the calculation of the means is done by summing all numbers in a range, and dividing by the count of numbers. We just have two different ranges that we need to sum (one after the modular re-mapping). Everything in that equation is based on R and (1-R), and the sum of numbers from 1 to K is quadratic. So you know it is *some* quadratic (i.e., a parabola) If the ratio was 1/2, then there would be no difference in the mean (no bias would be introduced). Similarly, if the ratio were 1, there would be no bias. Hence we know that this parabola has a zero value at R=1/2, and R = 1. Since parabolas are nice and symetric, the the zero-derivative (max) point is half-way between those two equal values. The precise equation (if I didn't screw up) for the difference in the means is: MaxInt * (R^2 -3R/2 + 1/2) Again, feel free to correct my math.
Hi Jim, First off, this bug wasn't affecting me in any practical way (Ilya asked the same question in the bug report). It's just something I noticed when reading the code and figured I'd fix since (a) the fix is simple and (b) making it right avoids any possibility of extremely subtle future bugs caused by the non-uniformity of base::RandGenerator. I agree that for small intervals, the non-uniformity would be hard to detect, and for intervals that are powers of 2 there is no bias. For small intervals, the fix adds very close to zero overhead as in these cases it's very unlikely that we call the underlying base::RandUint64 more than once (and if we do, it's because otherwise there could have been bias). I didn't really mean to spend much time in this area, thought it would be a quick, end-of-Friday-afternoon kind of fix :) But since you got me interested, I wrote another quick test that verifies that our random generator (RandUint64) generates randomness where we at least see every bit take both possible values after looking at some number of random values (in practice, on my Linux box, this only takes 10-20 iterations or less). Please take a quick look at the new version I uploaded, so I can get this checked in. Your math, and the argument that the expected value would be a hyperbola, looks right to me. However, if I do the math another way, just calculating the expected value, then for max = 2/3 * MAX_UINT64, I get an expected value of max*(2*1/4 + 3/4)/3 = max*0.41667, and if I use your max = 3/4 * MAX_UINT64, I get the same value, max*(2*1/6 + 3/6 + 5/6) / 4 = max*0.41667 (other values yield different results, e.g. max=4/5*MAX_UINT64 gives an expected value of max*0.34). Not sure why the discrepancy since I agree with your argument. Anyway, I don't want to get too deep into it so I just changed the comment in the file to not state that 2/3 is absolutely the worst case, just that it is a degenerate case: // A degenerate case for such an implementation is e.g. a top of range // that is 2/3rds of the way to MAX_UINT64, in which case the bottom // half of the range would be twice as likely to occur as the top // half. Hope that covers it. Cheers, Jói On Sat, May 28, 2011 at 1:26 PM, <jar@chromium.org> wrote: > > Note: I don't think I wrote the original code... and I'm not blocking the change > or being defensive... but I am curious. > > LGTM with the additional change in the test below. If you really want this class > of test of rand, please choose the optimal ratio (or tell me why your ratio is > optimal.. see comment in test). > > If the bias of randomness is critical to our application, we may perchance want > to add some tests for the quality of the rand on the various platforms (e.g., > check that at least each bit in a rand assumes both values sometimes... and > maybe that looking at only one bit at a time, that it is reasonably random). > > I wouldn't want to make this a doctoral study, or waste a ton of test cycles > doing this, but it might be worth having some tests of at least some simple > things. (example: with maximal modulus, for each single bit, it is possible to > get a zero and to get a one). > > If you're going to add tests to watch out for this you may as well write a few > "fast" tests (since you suggest it will impact your use) and be done with it > while you're hacking in this area. (...and I can believe on some platforms this > will be broken big time, such as by building a wimpy rand from a clock), > > > > > > http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc > File base/rand_util.cc (right): > > http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc#newcode59 > base/rand_util.cc:59: } while (value > max_acceptable_value); > OK... With that change I'm convinced it is "right," but when I've seen > this code historically in many codebases, (despite being aware of the > bias), I never thought it was significant (with uint64 max being a > whopping 2^64). The fact that your first patch introduced this class of > bias (due to a typo) and was virtually undetectable partially > substantiates my wonder. > > For my edification, can you cite how this bias was burning you in your > code? > > For example, when max is 3, the bias in randomness amounts to roughly 1 > in 2^62. I strongly suspect the underlying rand to have biases of that > order. I'd expect you'd need to perform a LOT of experiments before > you could detect that bias. > > You tested the other extreme, where max is very close to 2^64. (see > interesting comment in the test). > > When max is that large, each sample comes from a GIANT range. As a > result, any small number of samples will come with a large bias because > we don't come any where near to getting enough samples to prove the > range of answers are all even possible :-/. > > What is the example where this was harming you in a measurable way? > > Again, the change is IMO "right." rand is a real slow function usually, > so I don't like to see too many extra calls... but the likelihood of > such extras are small for most all users (who stay away from a giant > max, that is not nearly a power of 2). > > > On 2011/05/27 23:18:01, Jói wrote: >> >> On 2011/05/27 20:25:14, jar wrote: >> > Very cute.... but I think there is a typo, and this should read: >> > >> > while(value >= max_acceptable_value) >> > >> > Am I confused?? >> > >> > I think this currently gives a higher probability of returning 0 > > than any >> >> other >> > number in the range. > >> You're right, thanks for catching that. Fixed and uploaded a new > > version. > > http://codereview.chromium.org/7080005/diff/4006/base/rand_util_unittest.cc > File base/rand_util_unittest.cc (right): > > http://codereview.chromium.org/7080005/diff/4006/base/rand_util_unittest.cc#n... > base/rand_util_unittest.cc:73: // that is 2/3rds of the way to > MAX_UINT64, in which case the bottom > You got me to wondering about this ratio, and it wasn't clear that 2/3 > was optimally problematic. I applied calculus (which I've only done > about 4 times in my life, after gettin' out'a school), and concluded > that the maximum delta between the broken-code-mean and the correct > (range over 2 mean) is actually achieved with a ratio R of: > > 3/4 > > (If I'm wrong, please correct me! I'm notorious for clever ideas... and > typos in my arithmetic!). > > As always, after grinding through the calculus, there is an easy way to > see this result (that the delta in the means is maximized with R=3/4) > > We only consider R in the range of 1/2 to 1, so that we never have to > consider cases where the rand() values mapped more than twice into any > return value. > > The difference function is then a quadratic in R, where R is the ratio. > You can pretty much (in retrospect) guess that since the calculation of > the means is done by summing all numbers in a range, and dividing by the > count of numbers. We just have two different ranges that we need to sum > (one after the modular re-mapping). Everything in that equation is > based on R and (1-R), and the sum of numbers from 1 to K is quadratic. > > So you know it is *some* quadratic (i.e., a parabola) > > If the ratio was 1/2, then there would be no difference in the mean (no > bias would be introduced). Similarly, if the ratio were 1, there would > be no bias. Hence we know that this parabola has a zero value at R=1/2, > and R = 1. > > Since parabolas are nice and symetric, the the zero-derivative (max) > point is half-way between those two equal values. > > The precise equation (if I didn't screw up) for the difference in the > means is: > > MaxInt * (R^2 -3R/2 + 1/2) > > Again, feel free to correct my math. > > http://codereview.chromium.org/7080005/
On Mon, May 30, 2011 at 8:31 AM, Jói Sigurðsson <joi@chromium.org> wrote: > Hi Jim, > > First off, this bug wasn't affecting me in any practical way (Ilya > asked the same question in the bug report). It's just something I > noticed when reading the code and figured I'd fix since (a) the fix is > simple and (b) making it right avoids any possibility of extremely > subtle future bugs caused by the non-uniformity of > base::RandGenerator. > Agreed. I think I use similar "biased" code in my field trials. I can probably switch over to using your central (fixed) code. > I agree that for small intervals, the non-uniformity would be hard to > detect, and for intervals that are powers of 2 there is no bias. For > small intervals, the fix adds very close to zero overhead as in these > cases it's very unlikely that we call the underlying base::RandUint64 > more than once (and if we do, it's because otherwise there could have > been bias). > Agreed. > > I didn't really mean to spend much time in this area, thought it would > be a quick, end-of-Friday-afternoon kind of fix :) But since you got > me interested, I wrote another quick test that verifies that our > random generator (RandUint64) generates randomness where we at least > see every bit take both possible values after looking at some number > of random values (in practice, on my Linux box, this only takes 10-20 > iterations or less). Please take a quick look at the new version I > uploaded, so I can get this checked in. > > Agreed. > Your math, and the argument that the expected value would be a > hyperbola, looks right to me. However, if I do the math another way, > just calculating the expected value, then for max = 2/3 * MAX_UINT64, > I get an expected value of max*(2*1/4 + 3/4)/3 = max*0.41667, The key is that max is different in the two cases. As a result,the actual biased means are distinct in the two cases, and the unbiased means are similarly distinct (both scaled by the fraction in each case). That in turn causes the measurable delta twixt the biased and unbiased means to be vary depending on the fraction. In fact, the measurable delta is also scaled by the factor 3/4 or by 2/3, depending on the case. Hence the measurable delta is larger in the 3/4 case. > Anyway, I > don't want to get too deep into it so I just changed the comment in > the file to not state that 2/3 is absolutely the worst case, just that > it is a degenerate case: > > // A degenerate case for such an implementation is e.g. a top of range > // that is 2/3rds of the way to MAX_UINT64, in which case the bottom > // half of the range would be twice as likely to occur as the top > // half. > OK. I too spent too much time... but it was a fun toy problem to see what the optimal was. > > Hope that covers it. > > Cheers, > Jói > > SGTM. Jim > > > On Sat, May 28, 2011 at 1:26 PM, <jar@chromium.org> wrote: > > > > Note: I don't think I wrote the original code... and I'm not blocking the > change > > or being defensive... but I am curious. > > > > LGTM with the additional change in the test below. If you really want > this class > > of test of rand, please choose the optimal ratio (or tell me why your > ratio is > > optimal.. see comment in test). > > > > If the bias of randomness is critical to our application, we may > perchance want > > to add some tests for the quality of the rand on the various platforms > (e.g., > > check that at least each bit in a rand assumes both values sometimes... > and > > maybe that looking at only one bit at a time, that it is reasonably > random). > > > > I wouldn't want to make this a doctoral study, or waste a ton of test > cycles > > doing this, but it might be worth having some tests of at least some > simple > > things. (example: with maximal modulus, for each single bit, it is > possible to > > get a zero and to get a one). > > > > If you're going to add tests to watch out for this you may as well write > a few > > "fast" tests (since you suggest it will impact your use) and be done with > it > > while you're hacking in this area. (...and I can believe on some > platforms this > > will be broken big time, such as by building a wimpy rand from a clock), > > > > > > > > > > > > http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc > > File base/rand_util.cc (right): > > > > > http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc#newcode59 > > base/rand_util.cc:59: } while (value > max_acceptable_value); > > OK... With that change I'm convinced it is "right," but when I've seen > > this code historically in many codebases, (despite being aware of the > > bias), I never thought it was significant (with uint64 max being a > > whopping 2^64). The fact that your first patch introduced this class of > > bias (due to a typo) and was virtually undetectable partially > > substantiates my wonder. > > > > For my edification, can you cite how this bias was burning you in your > > code? > > > > For example, when max is 3, the bias in randomness amounts to roughly 1 > > in 2^62. I strongly suspect the underlying rand to have biases of that > > order. I'd expect you'd need to perform a LOT of experiments before > > you could detect that bias. > > > > You tested the other extreme, where max is very close to 2^64. (see > > interesting comment in the test). > > > > When max is that large, each sample comes from a GIANT range. As a > > result, any small number of samples will come with a large bias because > > we don't come any where near to getting enough samples to prove the > > range of answers are all even possible :-/. > > > > What is the example where this was harming you in a measurable way? > > > > Again, the change is IMO "right." rand is a real slow function usually, > > so I don't like to see too many extra calls... but the likelihood of > > such extras are small for most all users (who stay away from a giant > > max, that is not nearly a power of 2). > > > > > > On 2011/05/27 23:18:01, Jói wrote: > >> > >> On 2011/05/27 20:25:14, jar wrote: > >> > Very cute.... but I think there is a typo, and this should read: > >> > > >> > while(value >= max_acceptable_value) > >> > > >> > Am I confused?? > >> > > >> > I think this currently gives a higher probability of returning 0 > > > > than any > >> > >> other > >> > number in the range. > > > >> You're right, thanks for catching that. Fixed and uploaded a new > > > > version. > > > > > http://codereview.chromium.org/7080005/diff/4006/base/rand_util_unittest.cc > > File base/rand_util_unittest.cc (right): > > > > > http://codereview.chromium.org/7080005/diff/4006/base/rand_util_unittest.cc#n... > > base/rand_util_unittest.cc:73: // that is 2/3rds of the way to > > MAX_UINT64, in which case the bottom > > You got me to wondering about this ratio, and it wasn't clear that 2/3 > > was optimally problematic. I applied calculus (which I've only done > > about 4 times in my life, after gettin' out'a school), and concluded > > that the maximum delta between the broken-code-mean and the correct > > (range over 2 mean) is actually achieved with a ratio R of: > > > > 3/4 > > > > (If I'm wrong, please correct me! I'm notorious for clever ideas... and > > typos in my arithmetic!). > > > > As always, after grinding through the calculus, there is an easy way to > > see this result (that the delta in the means is maximized with R=3/4) > > > > We only consider R in the range of 1/2 to 1, so that we never have to > > consider cases where the rand() values mapped more than twice into any > > return value. > > > > The difference function is then a quadratic in R, where R is the ratio. > > You can pretty much (in retrospect) guess that since the calculation of > > the means is done by summing all numbers in a range, and dividing by the > > count of numbers. We just have two different ranges that we need to sum > > (one after the modular re-mapping). Everything in that equation is > > based on R and (1-R), and the sum of numbers from 1 to K is quadratic. > > > > So you know it is *some* quadratic (i.e., a parabola) > > > > If the ratio was 1/2, then there would be no difference in the mean (no > > bias would be introduced). Similarly, if the ratio were 1, there would > > be no bias. Hence we know that this parabola has a zero value at R=1/2, > > and R = 1. > > > > Since parabolas are nice and symetric, the the zero-derivative (max) > > point is half-way between those two equal values. > > > > The precise equation (if I didn't screw up) for the difference in the > > means is: > > > > MaxInt * (R^2 -3R/2 + 1/2) > > > > Again, feel free to correct my math. > > > > http://codereview.chromium.org/7080005/ >
Thanks Jim. Let me know when I have your LGTM for the added test. > [...] In fact, the measurable delta is also scaled by > the factor 3/4 or by 2/3, depending on the case. Hence the measurable delta > is larger in the 3/4 case. Right, you're absolutely right. I've switched to 3/4ths in the test. > OK. I too spent too much time... but it was a fun toy problem to see what > the optimal was. :-) It's a fun problem to think about. Cheers, Jói On Mon, May 30, 2011 at 12:40 PM, Jim Roskind <jar@chromium.org> wrote: > On Mon, May 30, 2011 at 8:31 AM, Jói Sigurðsson <joi@chromium.org> wrote: >> >> Hi Jim, >> >> First off, this bug wasn't affecting me in any practical way (Ilya >> asked the same question in the bug report). It's just something I >> noticed when reading the code and figured I'd fix since (a) the fix is >> simple and (b) making it right avoids any possibility of extremely >> subtle future bugs caused by the non-uniformity of >> base::RandGenerator. > > Agreed. I think I use similar "biased" code in my field trials. I can > probably switch over to using your central (fixed) code. > >> >> I agree that for small intervals, the non-uniformity would be hard to >> detect, and for intervals that are powers of 2 there is no bias. For >> small intervals, the fix adds very close to zero overhead as in these >> cases it's very unlikely that we call the underlying base::RandUint64 >> more than once (and if we do, it's because otherwise there could have >> been bias). > > Agreed. >> >> I didn't really mean to spend much time in this area, thought it would >> be a quick, end-of-Friday-afternoon kind of fix :) But since you got >> me interested, I wrote another quick test that verifies that our >> random generator (RandUint64) generates randomness where we at least >> see every bit take both possible values after looking at some number >> of random values (in practice, on my Linux box, this only takes 10-20 >> iterations or less). Please take a quick look at the new version I >> uploaded, so I can get this checked in. >> > > Agreed. > >> >> Your math, and the argument that the expected value would be a >> hyperbola, looks right to me. However, if I do the math another way, >> just calculating the expected value, then for max = 2/3 * MAX_UINT64, >> I get an expected value of max*(2*1/4 + 3/4)/3 = max*0.41667, > > The key is that max is different in the two cases. As a result,the actual > biased means are distinct in the two cases, and the unbiased means are > similarly distinct (both scaled by the fraction in each case). That in turn > causes the measurable delta twixt the biased and unbiased means to be vary > depending on the fraction. In fact, the measurable delta is also scaled by > the factor 3/4 or by 2/3, depending on the case. Hence the measurable delta > is larger in the 3/4 case. > >> >> Anyway, I >> don't want to get too deep into it so I just changed the comment in >> the file to not state that 2/3 is absolutely the worst case, just that >> it is a degenerate case: >> >> // A degenerate case for such an implementation is e.g. a top of range >> // that is 2/3rds of the way to MAX_UINT64, in which case the bottom >> // half of the range would be twice as likely to occur as the top >> // half. > > OK. I too spent too much time... but it was a fun toy problem to see what > the optimal was. > >> >> Hope that covers it. >> >> Cheers, >> Jói >> > > SGTM. > Jim > >> >> On Sat, May 28, 2011 at 1:26 PM, <jar@chromium.org> wrote: >> > >> > Note: I don't think I wrote the original code... and I'm not blocking >> > the change >> > or being defensive... but I am curious. >> > >> > LGTM with the additional change in the test below. If you really want >> > this class >> > of test of rand, please choose the optimal ratio (or tell me why your >> > ratio is >> > optimal.. see comment in test). >> > >> > If the bias of randomness is critical to our application, we may >> > perchance want >> > to add some tests for the quality of the rand on the various platforms >> > (e.g., >> > check that at least each bit in a rand assumes both values sometimes... >> > and >> > maybe that looking at only one bit at a time, that it is reasonably >> > random). >> > >> > I wouldn't want to make this a doctoral study, or waste a ton of test >> > cycles >> > doing this, but it might be worth having some tests of at least some >> > simple >> > things. (example: with maximal modulus, for each single bit, it is >> > possible to >> > get a zero and to get a one). >> > >> > If you're going to add tests to watch out for this you may as well write >> > a few >> > "fast" tests (since you suggest it will impact your use) and be done >> > with it >> > while you're hacking in this area. (...and I can believe on some >> > platforms this >> > will be broken big time, such as by building a wimpy rand from a clock), >> > >> > >> > >> > >> > >> > http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc >> > File base/rand_util.cc (right): >> > >> > >> > http://codereview.chromium.org/7080005/diff/4003/base/rand_util.cc#newcode59 >> > base/rand_util.cc:59: } while (value > max_acceptable_value); >> > OK... With that change I'm convinced it is "right," but when I've seen >> > this code historically in many codebases, (despite being aware of the >> > bias), I never thought it was significant (with uint64 max being a >> > whopping 2^64). The fact that your first patch introduced this class of >> > bias (due to a typo) and was virtually undetectable partially >> > substantiates my wonder. >> > >> > For my edification, can you cite how this bias was burning you in your >> > code? >> > >> > For example, when max is 3, the bias in randomness amounts to roughly 1 >> > in 2^62. I strongly suspect the underlying rand to have biases of that >> > order. I'd expect you'd need to perform a LOT of experiments before >> > you could detect that bias. >> > >> > You tested the other extreme, where max is very close to 2^64. (see >> > interesting comment in the test). >> > >> > When max is that large, each sample comes from a GIANT range. As a >> > result, any small number of samples will come with a large bias because >> > we don't come any where near to getting enough samples to prove the >> > range of answers are all even possible :-/. >> > >> > What is the example where this was harming you in a measurable way? >> > >> > Again, the change is IMO "right." rand is a real slow function usually, >> > so I don't like to see too many extra calls... but the likelihood of >> > such extras are small for most all users (who stay away from a giant >> > max, that is not nearly a power of 2). >> > >> > >> > On 2011/05/27 23:18:01, Jói wrote: >> >> >> >> On 2011/05/27 20:25:14, jar wrote: >> >> > Very cute.... but I think there is a typo, and this should read: >> >> > >> >> > while(value >= max_acceptable_value) >> >> > >> >> > Am I confused?? >> >> > >> >> > I think this currently gives a higher probability of returning 0 >> > >> > than any >> >> >> >> other >> >> > number in the range. >> > >> >> You're right, thanks for catching that. Fixed and uploaded a new >> > >> > version. >> > >> > >> > http://codereview.chromium.org/7080005/diff/4006/base/rand_util_unittest.cc >> > File base/rand_util_unittest.cc (right): >> > >> > >> > http://codereview.chromium.org/7080005/diff/4006/base/rand_util_unittest.cc#n... >> > base/rand_util_unittest.cc:73: // that is 2/3rds of the way to >> > MAX_UINT64, in which case the bottom >> > You got me to wondering about this ratio, and it wasn't clear that 2/3 >> > was optimally problematic. I applied calculus (which I've only done >> > about 4 times in my life, after gettin' out'a school), and concluded >> > that the maximum delta between the broken-code-mean and the correct >> > (range over 2 mean) is actually achieved with a ratio R of: >> > >> > 3/4 >> > >> > (If I'm wrong, please correct me! I'm notorious for clever ideas... and >> > typos in my arithmetic!). >> > >> > As always, after grinding through the calculus, there is an easy way to >> > see this result (that the delta in the means is maximized with R=3/4) >> > >> > We only consider R in the range of 1/2 to 1, so that we never have to >> > consider cases where the rand() values mapped more than twice into any >> > return value. >> > >> > The difference function is then a quadratic in R, where R is the ratio. >> > You can pretty much (in retrospect) guess that since the calculation of >> > the means is done by summing all numbers in a range, and dividing by the >> > count of numbers. We just have two different ranges that we need to sum >> > (one after the modular re-mapping). Everything in that equation is >> > based on R and (1-R), and the sum of numbers from 1 to K is quadratic. >> > >> > So you know it is *some* quadratic (i.e., a parabola) >> > >> > If the ratio was 1/2, then there would be no difference in the mean (no >> > bias would be introduced). Similarly, if the ratio were 1, there would >> > be no bias. Hence we know that this parabola has a zero value at R=1/2, >> > and R = 1. >> > >> > Since parabolas are nice and symetric, the the zero-derivative (max) >> > point is half-way between those two equal values. >> > >> > The precise equation (if I didn't screw up) for the difference in the >> > means is: >> > >> > MaxInt * (R^2 -3R/2 + 1/2) >> > >> > Again, feel free to correct my math. >> > >> > http://codereview.chromium.org/7080005/ > >
Thanks for adding this as well. See comment. If you really like the current form of the test (rather than the bitwise suggestion), then LGTM with a smaller limit on the loop. http://codereview.chromium.org/7080005/diff/8/base/rand_util_unittest.cc File base/rand_util_unittest.cc (right): http://codereview.chromium.org/7080005/diff/8/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:114: for (size_t i = 0; i < 1000000; ++i) { Each bit *should* have a 50-50 probabilty. Hence over 100 cycles without getting a bit to toggle would be ridiculously improbable, and probably 1000 would be wildly conservative. Then again... this test should never make it even near to that bound unless it is broken. The advantage of ending the test on this boundary, is we avoid running so long that the test simply "doesn't finish" and we have a less clear failure. http://codereview.chromium.org/7080005/diff/8/base/rand_util_unittest.cc#newc... base/rand_util_unittest.cc:115: uint64 value = base::RandUint64(); FYI: You could have used bit-wise AND and bitwise OR to see how long it took to get a 0 in each position, and a 1 in each position, rather than the vector manipulation: uint64 found_ones = 0x0; // bit of 1 shows we found a 1. uint64 found_zeros = ~found_ones; // bit 0 means found a 0. while (0 != found_zeros && ~found_ones != 0) { int64 value = base::RandInt64(); found_ones |= value; found_zeros &= value; }
Heh, no, the bitwise approach is much nicer, I've updated to that. I blame not having had my second coffee of the day for writing it in that weird way. I also set the maximum iterations to 1000. The test already completed in several seconds with the other maximum, but as per your logic there is only a 1 in ~10^301 chance that the test won't complete within 1000 iterations assuming the random generator gives each bit a close to equal chance of either value. Cheers, Jói On Mon, May 30, 2011 at 1:02 PM, <jar@chromium.org> wrote: > Thanks for adding this as well. See comment. > > If you really like the current form of the test (rather than the bitwise > suggestion), then LGTM with a smaller limit on the loop. > > > http://codereview.chromium.org/7080005/diff/8/base/rand_util_unittest.cc > File base/rand_util_unittest.cc (right): > > http://codereview.chromium.org/7080005/diff/8/base/rand_util_unittest.cc#newc... > base/rand_util_unittest.cc:114: for (size_t i = 0; i < 1000000; ++i) { > Each bit *should* have a 50-50 probabilty. Hence over 100 cycles > without getting a bit to toggle would be ridiculously improbable, and > probably 1000 would be wildly conservative. > > Then again... this test should never make it even near to that bound > unless it is broken. The advantage of ending the test on this boundary, > is we avoid running so long that the test simply "doesn't finish" and we > have a less clear failure. > > http://codereview.chromium.org/7080005/diff/8/base/rand_util_unittest.cc#newc... > base/rand_util_unittest.cc:115: uint64 value = base::RandUint64(); > FYI: You could have used bit-wise AND and bitwise OR to see how long it > took to get a 0 in each position, and a 1 in each position, rather than > the vector manipulation: > uint64 found_ones = 0x0; // bit of 1 shows we found a 1. > uint64 found_zeros = ~found_ones; // bit 0 means found a 0. > while (0 != found_zeros && ~found_ones != 0) { > int64 value = base::RandInt64(); > found_ones |= value; > found_zeros &= value; > } > > http://codereview.chromium.org/7080005/ >
LGTM |