Friday, 17 August 2007

pr.probability - Skewing the distribution of random values over a range

Let's clean this up. First, assume a = 0 (if not, just add it on at the end). Second, don't deal with closed intervals (0 <= n <= M, 0 <= n' <= b) but with half-open intervals (0 <= n < M, 0 <= n' < b). This second simplification is exactly the 'bit of tricky math' in the comment, which just replaces M by M+1 and b by b+1.



So now we have a random number 0 <= n < M, and we want to transform it into a random number 0 <= n' < b, where b <= M. This is an interesting question (if not necessarily the one that the OP asked). The solution in the OP's code fragment is to take n' = nb/M. Assuming that b doesn't divide M exactly (for then there is no problem), n' is not uniformly distributed: it takes a number of values (in fact, M mod b of them) with probability ⌈M/b⌉/M, and the remaining values with probability ⌊M/b⌋/M. Exactly the same is true if we take n'= n mod b; there is no statistical difference.



If M/b is big enough, then ⌈M/b⌉/M and ⌊M/b⌋/M are nearly equal; and this may be good enough for your application. If not, then the simplest solution is the following:
Loop:
Generate random 0 <= n < M
if (n < M - (M mod b)) return n mod b
goto Loop



The problem is that this algorithm is not guaranteed to halt! But you can get a closer and closer approximation to uniformity by setting a bigger and bigger bound on the number of times the loop is executed.



Note that for this algorithm, it is important to use n mod b instead of nb/M, because the check for uniformity (n < M - (M mod b)) is simple.

No comments:

Post a Comment