从1-5到1-7扩展一个随机范围
给定一个在1到5范围内产生一个随机整数的函数,编写一个函数来产生一个在1到7范围内的随机整数。
- 什么是简单的解决方案?
- 什么是有效的解决方案来减少内存使用量或在较慢的CPU上运行?
这相当于亚当·罗森菲尔德(Adam Rosenfield)的解决方案,但对于一些读者来说可能会更清楚些。 它假定rand5()是一个函数,它返回1到5范围内的统计随机整数。
int rand7() { int vals[5][5] = { { 1, 2, 3, 4, 5 }, { 6, 7, 1, 2, 3 }, { 4, 5, 6, 7, 1 }, { 2, 3, 4, 5, 6 }, { 7, 0, 0, 0, 0 } }; int result = 0; while (result == 0) { int i = rand5(); int j = rand5(); result = vals[i-1][j-1]; } return result; }
它是如何工作的? 可以这样想:假设在纸上打印出这个二维数组,然后将其贴在飞镖板上,然后随意地掷飞镖。 如果你打到一个非零值,这是一个在1到7之间的统计随机值,因为有相同数量的非零值可供选择。 如果你击中一个零,只要继续掷飞镖,直到击中一个非零。 这就是这个代码所做的事情:i和j索引随机在飞镖板上选择一个位置,如果我们没有得到好的结果,我们会继续掷飞镖。
就像亚当说的那样,在最糟糕的情况下,这种情况可能会永远持续下去,但从统计上看,最糟糕的情况从来没有发生 🙂
由于1/7是基数5中的无限小数,所以没有(完全正确的)解决方案将在恒定的时间内运行。一个简单的解决方案是使用拒绝采样,例如:
int i; do { i = 5 * (rand5() - 1) + rand5(); // i is now uniformly random between 1 and 25 } while(i > 21); // i is now uniformly random between 1 and 21 return i % 7 + 1; // result is now uniformly random between 1 and 7
这个循环的预期运行时间是25/21 = 1.19迭代,但循环的概率永远是无限小的。
除了我的第一个答案之外, 我想补充一个答案 。 这个答案试图最小化对rand7()
每次调用rand5()
调用rand7()
,以最大限度地利用随机性。 也就是说,如果你认为随机性是一个宝贵的资源,我们希望尽可能多地使用它,而不会丢弃任何随机比特。 这个答案也与伊万答案中提出的逻辑有一些相似之处。
随机变量的熵是一个明确的量。 对于一个具有相同概率(均匀分布)的N个状态的随机变量,熵是log 2 N.因此, rand5()
具有大约2.32193比特的熵,并且rand7()
具有大约2.80735比特的熵。 如果我们希望最大化我们对随机性的使用,我们需要使用从rand5()
每个调用中的所有2.32193比特的熵,并且将它们应用于生成对rand7()
每个调用所需的2.80735比特的熵。 那么,根本的限制就是,我们可以做到每次调用rand7()
都不会比log(7)/ log(5)= 1.20906调用rand5()
rand7()
。
注意事项:除非另有说明,否则答案中的所有对数将以2为底数。 rand5()
将假定返回范围[0,4]中的数字,并且rand7()
将假定返回范围[ rand7()
]中的数字。 分别调整范围为[1,5]和[1,7]是微不足道的。
那我们该怎么做呢? 我们生成一个0到1之间的无限精确的随机实数(假设我们实际上可以计算并存储这样一个无限精确的数字 – 稍后我们将解决这个问题)。 我们可以通过生成基数为5的数字来生成这样的数字:我们选择随机数0. a
1 a
2 a
3 …,其中每个数字a i
通过调用rand5()
来选择。 例如,如果我们的RNG为所有的i
选择一个i
= 1,那么忽略那个不是非常随机的事实,那对应于实数1/5 + 1/5 2 + 1/5 3 +。 = 1/4(几何系列之和)。
好的,所以我们选择了一个0到1之间的随机实数。我现在声称这个随机数是均匀分布的。 直观地说,这很容易理解,因为每个数字都是一致的,数字是无限精确的。 然而,对此的一个形式化的证明涉及更多一些,因为现在我们处理的是连续分布而不是离散分布,所以我们需要证明我们的数字位于区间[ a
, b
]的概率等于该间隔的长度, b - a
。 证明留给读者=)。
现在我们有一个从[0,1]范围内统一选择的随机实数,我们需要将它转换为一系列在[ rand7()
]范围内的均匀随机数来生成rand7()
的输出。 我们如何做到这一点? 正好和我们刚刚做的相反 – 我们把它转换成基数为7的无限精确的小数,然后每个基数的7位数将对应rand7()
一个输出。
以前面的例子来说,如果rand5()
产生1的无穷小流,那么我们的随机实数就是1/4。 将1/4转换为7,得到无穷小数0.15151515 …,所以我们将产生1,5,1,5,1,5等等
好,所以我们有主要想法,但是我们还有两个问题:我们实际上无法计算或存储一个无限精确的实数,所以我们如何处理它的有限部分呢? 其次,我们如何将其转换为基数7?
我们可以将0和1之间的数字转换为7的方法如下:
- 乘以7
- 结果的组成部分是下一个7位数字
- 减去整体部分,只留下小数部分
- 转到步骤1
为了处理无限精度的问题,我们计算一个部分结果,并且存储结果的上限。 也就是说,假设我们调用了rand5()
两次,并且两次都返回1。 我们迄今为止产生的数字是0.11(基数5)。 无论rand5()
产生的无限序列的其余部分如何,我们产生的随机实数永远不会大于0.12:0.11≤0.11xyz … <0.12总是如此。
因此,跟踪目前的数字,以及它可以采取的最大值,我们将这两个数字转换为基数7.如果他们同意前k
数字,那么我们可以安全地输出下一个k
数字 – 无论什么是无限的基数5数字流,它们不会影响基数7表示的下k
数字!
这就是算法 – 为了生成rand7()
的下一个输出,我们只生成rand5()
所需的数字,以确保我们可以确定地知道随机实数转换中的下一个数字的值以7为底。这是一个Python实现,带有一个测试工具:
import random rand5_calls = 0 def rand5(): global rand5_calls rand5_calls += 1 return random.randint(0, 4) def rand7_gen(): state = 0 pow5 = 1 pow7 = 7 while True: if state / pow5 == (state + pow7) / pow5: result = state / pow5 state = (state - result * pow5) * 7 pow7 *= 7 yield result else: state = 5 * state + pow7 * rand5() pow5 *= 5 if __name__ == '__main__': r7 = rand7_gen() N = 10000 x = list(next(r7) for i in range(N)) distr = [x.count(i) for i in range(7)] expmean = N / 7.0 expstddev = math.sqrt(N * (1.0/7.0) * (6.0/7.0)) print '%d TRIALS' % N print 'Expected mean: %.1f' % expmean print 'Expected standard deviation: %.1f' % expstddev print print 'DISTRIBUTION:' for i in range(7): print '%d: %d (%+.3f stddevs)' % (i, distr[i], (distr[i] - expmean) / expstddev) print print 'Calls to rand5: %d (average of %f per call to rand7)' % (rand5_calls, float(rand5_calls) / N)
请注意, rand7_gen()
返回一个生成器,因为它具有内部状态,涉及将数字转换为基数7.测试用具next(r7)
10000次调用以产生10000个随机数,然后测量它们的分布。 只使用整数数学,所以结果是完全正确的。
另外请注意,这里的数字变得非常大, 非常快。 5和7的权力迅速增长。 因此,由于算术运算,在产生大量随机数之后,性能将开始显着降低。 但请记住,我的目标是最大限度地利用随机位,而不是最大化性能(尽管这是次要目标)。
在这一次运行中,我对rand7()
rand5()
进行了1次调用rand7()
,对rand7()
rand5()
进行了10000次调用,实现log(7)/ log(5)次调用的平均值为4个有效数字,得到的输出是一致的。
为了将这些代码移植到一个没有内置任意大整数的语言中,你必须将pow5
和pow7
的值设置为原生整数类型的最大值 – 如果它们太大,然后重置所有内容并重新开始。 这会增加对rand7()
的调用rand5()
的平均次数,但对于32位或64位整数来说,希望它不会增加太多。
(我已经偷走了Adam Rosenfeld的答案 ,并且跑得快7%左右。)
假设rand5()返回{0,1,2,3,4}中的一个具有相等的分布,并且目标是返回{0,1,2,3,4,5,6}并且具有相等的分布。
int rand7() { i = 5 * rand5() + rand5(); max = 25; //i is uniform among {0 ... max-1} while(i < max%7) { //i is uniform among {0 ... (max%7 - 1)} i *= 5; i += rand5(); //i is uniform {0 ... (((max%7)*5) - 1)} max %= 7; max *= 5; //once again, i is uniform among {0 ... max-1} } return(i%7); }
我们正在追踪循环在变量max
可以做出的max
。 如果reult到目前为止在最大%7和最大-1之间,那么结果将在该范围内统一分布。 如果不是,我们使用余数,它是0和最大%7-1之间的随机数,另一个调用rand()来创建一个新的数字和一个新的最大值。 然后我们重新开始。
编辑:期待调用rand5()的次数是x在这个方程中:
x = 2 * 21/25 + 3 * 4/25 * 14/20 + 4 * 4/25 * 6/20 * 28/30 + 5 * 4/25 * 6/20 * 2/30 * 7/10 + 6 * 4/25 * 6/20 * 2/30 * 3/10 * 14/15 + (6+x) * 4/25 * 6/20 * 2/30 * 3/10 * 1/15 x = about 2.21 calls to rand5()
算法:
7可以以3比特的序列表示
使用rand(5)以0或1随机填充每一位。
例如:调用rand(5)和
如果结果是1或2,则用0填充该位
如果结果是4或5,则用1填充该位
如果结果是3,则忽略并再次执行(拒绝)
这样我们可以用0/1随机填充3位,从而得到1-7的数字。
编辑:这似乎是最简单和最有效的答案,所以这里有一些代码:
public static int random_7() { int returnValue = 0; while (returnValue == 0) { for (int i = 1; i <= 3; i++) { returnValue = (returnValue << 1) + random_5_output_2(); } } return returnValue; } private static int random_5_output_2() { while (true) { int flip = random_5(); if (flip < 3) { return 0; } else if (flip > 3) { return 1; } } }
int randbit( void ) { while( 1 ) { int r = rand5(); if( r <= 4 ) return(r & 1); } } int randint( int nbits ) { int result = 0; while( nbits-- ) { result = (result<<1) | randbit(); } return( result ); } int rand7( void ) { while( 1 ) { int r = randint( 3 ) + 1; if( r <= 7 ) return( r ); } }
int ans = 0; while (ans == 0) { for (int i=0; i<3; i++) { while ((r = rand5()) == 3){}; ans += (r < 3) >> i } }
rand7() = (rand5()+rand5()+rand5()+rand5()+rand5()+rand5()+rand5())%7+1
编辑:这不太有效。 它在1000左右(假设一个完美的rand5)约2部分。 水桶得到:
value Count Error% 1 11158 -0.0035 2 11144 -0.0214 3 11144 -0.0214 4 11158 -0.0035 5 11172 +0.0144 6 11177 +0.0208 7 11172 +0.0144
通过切换到一个和
n Error% 10 +/- 1e-3, 12 +/- 1e-4, 14 +/- 1e-5, 16 +/- 1e-6, ... 28 +/- 3e-11
似乎每增加2个就会获得一个数量级
顺便说一下:上面的错误表不是通过抽样产生的,而是由下面的递推关系产生的:
p[x,n]
是n
调用rand5
output=x
的数字方式。
p[1,1] ... p[5,1] = 1 p[6,1] ... p[7,1] = 0 p[1,n] = p[7,n-1] + p[6,n-1] + p[5,n-1] + p[4,n-1] + p[3,n-1] p[2,n] = p[1,n-1] + p[7,n-1] + p[6,n-1] + p[5,n-1] + p[4,n-1] p[3,n] = p[2,n-1] + p[1,n-1] + p[7,n-1] + p[6,n-1] + p[5,n-1] p[4,n] = p[3,n-1] + p[2,n-1] + p[1,n-1] + p[7,n-1] + p[6,n-1] p[5,n] = p[4,n-1] + p[3,n-1] + p[2,n-1] + p[1,n-1] + p[7,n-1] p[6,n] = p[5,n-1] + p[4,n-1] + p[3,n-1] + p[2,n-1] + p[1,n-1] p[7,n] = p[6,n-1] + p[5,n-1] + p[4,n-1] + p[3,n-1] + p[2,n-1]
下面利用在{1,2,3,4,5}上产生均匀分布的随机数发生器在{1,2,3,4,5,6,7}上产生均匀分布。 代码很混乱,但逻辑清晰。
public static int random_7(Random rg) { int returnValue = 0; while (returnValue == 0) { for (int i = 1; i <= 3; i++) { returnValue = (returnValue << 1) + SimulateFairCoin(rg); } } return returnValue; } private static int SimulateFairCoin(Random rg) { while (true) { int flipOne = random_5_mod_2(rg); int flipTwo = random_5_mod_2(rg); if (flipOne == 0 && flipTwo == 1) { return 0; } else if (flipOne == 1 && flipTwo == 0) { return 1; } } } private static int random_5_mod_2(Random rg) { return random_5(rg) % 2; } private static int random_5(Random rg) { return rg.Next(5) + 1; }
如果我们考虑试图给出最有效答案的附加约束条件,也就是给定一个输入流I
的一个长度为m
的均匀分布的整数,从1-5输出一个具有均匀分布的整数的流O
,相对于m
最长长度,例如L(m)
。
分析这个最简单的方法是将流I和O
视为五进制和七进制数。 这是通过主流答案的流a1, a2, a3,... -> a1+5*a2+5^2*a3+..
以及类似的流O
。
那么如果我们取长度为m choose n st 5^m-7^n=c
的输入流的一部分,则m choose n st 5^m-7^n=c
其中c>0
且尽可能小的m choose n st 5^m-7^n=c
。 那么从长度为m的输入流到从1
到5^m
整数有一个统一的映射,从长度为7^n
的输出流到从1到7^n
整数有一个统一的映射,我们可能需要从当映射的整数超过7^n
时的输入流。
所以这给出了m (log5/log7)
约为.82m
L(m)
值。
上述分析的困难是难以精确求解的方程5^m-7^n=c
,以及从1
到5^m
的均匀值超过7^n
,我们失去了效率。
问题是m(log5 / log7)的最佳可能值有多接近。 例如,当这个数字接近一个整数时,我们可以找到一种方法来实现这个确切的整数个输出值?
如果5^m-7^n=c
那么从输入流中我们有效地生成一个从0
到(5^m)-1
的均匀随机数,并且不使用任何高于7^n
值。 但是,这些价值可以拯救和再次使用。 它们有效地生成一个从1到5^m-7^n
的统一的数字序列。 所以我们可以尝试使用它们并将它们转换成7进制数,这样我们可以创建更多的输出值。
如果我们令T7(X)
是从一个大小为X
的均匀输入得到的random(1-7)
整数输出序列的平均长度,并假设5^m=7^n0+7^n1+7^n2+...+7^nr+s, s<7
。
那么T7(5^m)=n0x7^n0/5^m + ((5^m-7^n0)/5^m) T7(5^m-7^n0)
(5^m-7^n0)/5^m)
的剩余长度为5^m-7^n0
(5^m-7^n0)/5^m)
。
如果我们只是不断取代我们获得:
T7(5^m) = n0x7^n0/5^m + n1x7^n1/5^m + ... + nrx7^nr/5^m = (n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/5^m
于是
L(m)=T7(5^m)=(n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/(7^n0+7^n1+7^n2+...+7^nr+s)
另一种说法是:
If 5^m has 7-ary representation `a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r Then L(m) = (a1*7 + 2a2*7^2 + 3a3*7^3+...+rar*7^r)/(a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r)
最好的情况是我的原始的一个5^m=7^n+s
,其中s<7
。
然后T7(5^m) = nx(7^n)/(7^n+s) = n+o(1) = m (Log5/Log7)+o(1)
最糟糕的情况是当我们只能找到k和st 5 ^ m = kx7 + s。
Then T7(5^m) = 1x(k.7)/(k.7+s) = 1+o(1)
其他情况是在这之间的某处。 我们可以看到,对于非常大的m,我们可以做得多好,也就是说我们可以得到错误项有多好:
T7(5^m) = m (Log5/Log7)+e(m)
一般来说,实现e(m) = o(1)
似乎是不可能的,但是希望能证明e(m)=o(m)
。
整个事情是依赖于对于不同的m
值的5^m
7位数的分布。
我相信这里有很多理论可以涵盖这个,我可以看一看,并在某个时候回报。
家庭作业问题是否允许在这里?
这个函数粗略的“基数5”数学生成一个介于0和6之间的数字。
function rnd7() { do { r1 = rnd5() - 1; do { r2=rnd5() - 1; } while (r2 > 1); result = r2 * 5 + r1; } while (result > 6); return result + 1; }
这是一个有效的Python实现亚当的答案 。
import random def rand5(): return random.randint(1, 5) def rand7(): while True: r = 5 * (rand5() - 1) + rand5() #r is now uniformly random between 1 and 25 if (r <= 21): break #result is now uniformly random between 1 and 7 return r % 7 + 1
我喜欢把我正在研究的算法放到Python中,所以我可以和他们一起玩,我以为我会在这里发布它,希望对那里的人有用,而不是花费很长时间才能把它放在一起。
为什么不简单呢?
int random7() { return random5() + (random5() % 3); }
在这个解决方案中得到1和7的可能性较低,但是,如果你只是想要一个快速和可读的解决方案,这是要走的路。
假设rand(n)的意思是“从0到n-1的均匀分布的随机整数”,下面是一个使用Python的randint的代码示例,它具有这种效果。 它只使用randint(5)和常量来产生randint(7)的效果。 实际上有点傻
from random import randint sum = 7 while sum >= 7: first = randint(0,5) toadd = 9999 while toadd>1: toadd = randint(0,5) if toadd: sum = first+5 else: sum = first assert 7>sum>=0 print sum
亚当·罗森菲尔德的正确答案背后的前提是:
- x = 5 ^ n(在他的情况下: n = 2)
- 操纵n rand5调用来获取范围内的数字y [1,x]
- z =((int)(x / 7))* 7
- 如果y> z,再试一次。 否则返回y%7 + 1
当n等于2时,你有4个可抛弃的可能性:y = {22,23,24,25}。 如果你使用n等于6,那么你只有一次抛弃:y = {15625}。
5 ^ 6 = 15625
7 * 2232 = 15624
你多打了一次rand5。 However, you have a much lower chance of getting a throw-away value (or an infinite loop). If there is a way to get no possible throw-away value for y, I haven't found it yet.
Here's my answer:
static struct rand_buffer { unsigned v, count; } buf2, buf3; void push (struct rand_buffer *buf, unsigned n, unsigned v) { buf->v = buf->v * n + v; ++buf->count; } #define PUSH(n, v) push (&buf##n, n, v) int rand16 (void) { int v = buf2.v & 0xf; buf2.v >>= 4; buf2.count -= 4; return v; } int rand9 (void) { int v = buf3.v % 9; buf3.v /= 9; buf3.count -= 2; return v; } int rand7 (void) { if (buf3.count >= 2) { int v = rand9 (); if (v < 7) return v % 7 + 1; PUSH (2, v - 7); } for (;;) { if (buf2.count >= 4) { int v = rand16 (); if (v < 14) { PUSH (2, v / 7); return v % 7 + 1; } PUSH (2, v - 14); } // Get a number between 0 & 25 int v = 5 * (rand5 () - 1) + rand5 () - 1; if (v < 21) { PUSH (3, v / 7); return v % 7 + 1; } v -= 21; PUSH (2, v & 1); PUSH (2, v >> 1); } }
It's a little more complicated than others, but I believe it minimises the calls to rand5. As with other solutions, there's a small probability that it could loop for a long time.
As long as there aren't seven possibilities left to choose from, draw another random number, which multiplies the number of possibilities by five. In Perl:
$num = 0; $possibilities = 1; sub rand7 { while( $possibilities < 7 ) { $num = $num * 5 + int(rand(5)); $possibilities *= 5; } my $result = $num % 7; $num = int( $num / 7 ); $possibilities /= 7; return $result; }
Simple and efficient:
int rand7 ( void ) { return 4; // this number has been calculated using // rand5() and is in the range 1..7 }
(Inspired by What's your favorite "programmer" cartoon? ).
I don't like ranges starting from 1, so I'll start from 0 🙂
unsigned rand5() { return rand() % 5; } unsigned rand7() { int r; do { r = rand5(); r = r * 5 + rand5(); r = r * 5 + rand5(); r = r * 5 + rand5(); r = r * 5 + rand5(); r = r * 5 + rand5(); } while (r > 15623); return r / 2232; }
I know it has been answered, but is this seems to work ok, but I can not tell you if it has a bias. My 'testing' suggests it is, at least, reasonable.
Perhaps Adam Rosenfield would be kind enough to comment?
My (naive?) idea is this:
Accumulate rand5's until there is enough random bits to make a rand7. This takes at most 2 rand5's. To get the rand7 number I use the accumulated value mod 7.
To avoid the accumulator overflowing, and since the accumulator is mod 7 then I take the mod 7 of the accumulator:
(5a + rand5) % 7 = (k*7 + (5a%7) + rand5) % 7 = ( (5a%7) + rand5) % 7
The rand7() function follows:
(I let the range of rand5 be 0-4 and rand7 is likewise 0-6.)
int rand7(){ static int a=0; static int e=0; int r; a = a * 5 + rand5(); e = e + 5; // added 5/7ths of a rand7 number if ( e<7 ){ a = a * 5 + rand5(); e = e + 5; // another 5/7ths } r = a % 7; e = e - 7; // removed a rand7 number a = a % 7; return r; }
Edit: Added results for 100 million trials.
'Real' rand functions mod 5 or 7
rand5 : avg=1.999802 0:20003944 1:19999889 2:20003690 3:19996938 4:19995539 rand7 : avg=3.000111 0:14282851 1:14282879 2:14284554 3:14288546 4:14292388 5:14288736 6:14280046
My rand7
Average looks ok and number distributions look ok too.
randt : avg=3.000080 0:14288793 1:14280135 2:14287848 3:14285277 4:14286341 5:14278663 6:14292943
There are elegant algorithms cited above, but here's one way to approach it, although it might be roundabout. I am assuming values generated from 0.
R2 = random number generator giving values less than 2 (sample space = {0, 1})
R8 = random number generator giving values less than 8 (sample space = {0, 1, 2, 3, 4, 5, 6, 7})
In order to generate R8 from R2, you will run R2 thrice, and use the combined result of all 3 runs as a binary number with 3 digits. Here are the range of values when R2 is ran thrice:
0 0 0 –> 0
。
。
1 1 1 –> 7
Now to generate R7 from R8, we simply run R7 again if it returns 7:
int R7() { do { x = R8(); } while (x > 6) return x; }
The roundabout solution is to generate R2 from R5 (just like we generated R7 from R8), then R8 from R2 and then R7 from R8.
There you go, uniform distribution and zero rand5 calls.
def rand7: seed += 1 if seed >= 7: seed = 0 yield seed
Need to set seed beforehand.
Here's a solution that fits entirely within integers and is within about 4% of optimal (ie uses 1.26 random numbers in {0..4} for every one in {0..6}). The code's in Scala, but the math should be reasonably clear in any language: you take advantage of the fact that 7^9 + 7^8 is very close to 5^11. So you pick an 11 digit number in base 5, and then interpret it as a 9 digit number in base 7 if it's in range (giving 9 base 7 numbers), or as an 8 digit number if it's over the 9 digit number, etc.:
abstract class RNG { def apply(): Int } class Random5 extends RNG { val rng = new scala.util.Random var count = 0 def apply() = { count += 1 ; rng.nextInt(5) } } class FiveSevener(five: RNG) { val sevens = new Array[Int](9) var nsevens = 0 val to9 = 40353607; val to8 = 5764801; val to7 = 823543; def loadSevens(value: Int, count: Int) { nsevens = 0; var remaining = value; while (nsevens < count) { sevens(nsevens) = remaining % 7 remaining /= 7 nsevens += 1 } } def loadSevens { var fivepow11 = 0; var i=0 while (i<11) { i+=1 ; fivepow11 = five() + fivepow11*5 } if (fivepow11 < to9) { loadSevens(fivepow11 , 9) ; return } fivepow11 -= to9 if (fivepow11 < to8) { loadSevens(fivepow11 , 8) ; return } fivepow11 -= to8 if (fivepow11 < 3*to7) loadSevens(fivepow11 % to7 , 7) else loadSevens } def apply() = { if (nsevens==0) loadSevens nsevens -= 1 sevens(nsevens) } }
If you paste a test into the interpreter (REPL actually), you get:
scala> val five = new Random5 five: Random5 = Random5@e9c592 scala> val seven = new FiveSevener(five) seven: FiveSevener = FiveSevener@143c423 scala> val counts = new Array[Int](7) counts: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0) scala> var i=0 ; while (i < 100000000) { counts( seven() ) += 1 ; i += 1 } i: Int = 100000000 scala> counts res0: Array[Int] = Array(14280662, 14293012, 14281286, 14284836, 14287188, 14289332, 14283684) scala> five.count res1: Int = 125902876
The distribution is nice and flat (within about 10k of 1/7 of 10^8 in each bin, as expected from an approximately-Gaussian distribution).
int rand7() { int value = rand5() + rand5() * 2 + rand5() * 3 + rand5() * 4 + rand5() * 5 + rand5() * 6; return value%7; }
Unlike the chosen solution, the algorithm will run in constant time. It does however make 2 more calls to rand5 than the average run time of the chosen solution.
Note that this generator is not perfect (the number 0 has 0.0064% more chance than any other number), but for most practical purposes the guarantee of constant time probably outweighs this inaccuracy.
说明
This solution is derived from the fact that the number 15,624 is divisible by 7 and thus if we can randomly and uniformly generate numbers from 0 to 15,624 and then take mod 7 we can get a near-uniform rand7 generator. Numbers from 0 to 15,624 can be uniformly generated by rolling rand5 6 times and using them to form the digits of a base 5 number as follows:
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
Properties of mod 7 however allow us to simplify the equation a bit:
5^5 = 3 mod 7 5^4 = 2 mod 7 5^3 = 6 mod 7 5^2 = 4 mod 7 5^1 = 5 mod 7
所以
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
变
rand5 * 3 + rand5 * 2 + rand5 * 6 + rand5 * 4 + rand5 * 5 + rand5
理论
The number 15,624 was not chosen randomly, but can be discovered using fermat's little theorem, which states that if p is a prime number then
a^(p-1) = 1 mod p
So this gives us,
(5^6)-1 = 0 mod 7
(5^6)-1 is equal to
4 * 5^5 + 4 * 5^4 + 4 * 5^3 + 4 * 5^2 + 4 * 5 + 4
This is a number in base 5 form and thus we can see that this method can be used to go from any random number generator to any other random number generator. Though a small bias towards 0 is always introduced when using the exponent p-1.
By using a rolling total , you can both
- maintain an equal distribution; 和
- not have to sacrifice any element in the random sequence.
Both these problems are an issue with the simplistic rand(5)+rand(5)...
-type solutions. The following Python code shows how to implement it (most of this is proving the distribution).
import random x = [] for i in range (0,7): x.append (0) t = 0 tt = 0 for i in range (0,700000): ######################################## ##### qq.py ##### r = int (random.random () * 5) t = (t + r) % 7 ######################################## ##### qq_notsogood.py ##### #r = 20 #while r > 6: #r = int (random.random () * 5) #r = r + int (random.random () * 5) #t = r ######################################## x[t] = x[t] + 1 tt = tt + 1 high = x[0] low = x[0] for i in range (0,7): print "%d: %7d %.5f" % (i, x[i], 100.0 * x[i] / tt) if x[i] < low: low = x[i] if x[i] > high: high = x[i] diff = high - low print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / tt)
And this output shows the results:
pax$ python qq.py 0: 99908 14.27257 1: 100029 14.28986 2: 100327 14.33243 3: 100395 14.34214 4: 99104 14.15771 5: 99829 14.26129 6: 100408 14.34400 Variation = 1304 (0.18629%) pax$ python qq.py 0: 99547 14.22100 1: 100229 14.31843 2: 100078 14.29686 3: 99451 14.20729 4: 100284 14.32629 5: 100038 14.29114 6: 100373 14.33900 Variation = 922 (0.13171%) pax$ python qq.py 0: 100481 14.35443 1: 99188 14.16971 2: 100284 14.32629 3: 100222 14.31743 4: 99960 14.28000 5: 99426 14.20371 6: 100439 14.34843 Variation = 1293 (0.18471%)
A simplistic rand(5)+rand(5)
, ignoring those cases where this returns more than 6 has a typical variation of 18%, 100 times that of the method shown above:
pax$ python qq_notsogood.py 0: 31756 4.53657 1: 63304 9.04343 2: 95507 13.64386 3: 127825 18.26071 4: 158851 22.69300 5: 127567 18.22386 6: 95190 13.59857 Variation = 127095 (18.15643%) pax$ python qq_notsogood.py 0: 31792 4.54171 1: 63637 9.09100 2: 95641 13.66300 3: 127627 18.23243 4: 158751 22.67871 5: 126782 18.11171 6: 95770 13.68143 Variation = 126959 (18.13700%) pax$ python qq_notsogood.py 0: 31955 4.56500 1: 63485 9.06929 2: 94849 13.54986 3: 127737 18.24814 4: 159687 22.81243 5: 127391 18.19871 6: 94896 13.55657 Variation = 127732 (18.24743%)
And, on the advice of Nixuz, I've cleaned the script up so you can just extract and use the rand7...
stuff:
import random # rand5() returns 0 through 4 inclusive. def rand5(): return int (random.random () * 5) # rand7() generator returns 0 through 6 inclusive (using rand5()). def rand7(): rand7ret = 0 while True: rand7ret = (rand7ret + rand5()) % 7 yield rand7ret # Number of test runs. count = 700000 # Work out distribution. distrib = [0,0,0,0,0,0,0] rgen =rand7() for i in range (0,count): r = rgen.next() distrib[r] = distrib[r] + 1 # Print distributions and calculate variation. high = distrib[0] low = distrib[0] for i in range (0,7): print "%d: %7d %.5f" % (i, distrib[i], 100.0 * distrib[i] / count) if distrib[i] < low: low = distrib[i] if distrib[i] > high: high = distrib[i] diff = high - low print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / count)
This answer is more an experiment in obtaining the most entropy possible from the Rand5 function. t is therefore somewhat unclear and almost certainly a lot slower than other implementations.
Assuming the uniform distribution from 0-4 and resulting uniform distribution from 0-6:
public class SevenFromFive { public SevenFromFive() { // this outputs a uniform ditribution but for some reason including it // screws up the output distribution // open question Why? this.fifth = new ProbabilityCondensor(5, b => {}); this.eigth = new ProbabilityCondensor(8, AddEntropy); } private static Random r = new Random(); private static uint Rand5() { return (uint)r.Next(0,5); } private class ProbabilityCondensor { private readonly int samples; private int counter; private int store; private readonly Action<bool> output; public ProbabilityCondensor(int chanceOfTrueReciprocal, Action<bool> output) { this.output = output; this.samples = chanceOfTrueReciprocal - 1; } public void Add(bool bit) { this.counter++; if (bit) this.store++; if (counter == samples) { bool? e; if (store == 0) e = false; else if (store == 1) e = true; else e = null;// discard for now counter = 0; store = 0; if (e.HasValue) output(e.Value); } } } ulong buffer = 0; const ulong Mask = 7UL; int bitsAvail = 0; private readonly ProbabilityCondensor fifth; private readonly ProbabilityCondensor eigth; private void AddEntropy(bool bit) { buffer <<= 1; if (bit) buffer |= 1; bitsAvail++; } private void AddTwoBitsEntropy(uint u) { buffer <<= 2; buffer |= (u & 3UL); bitsAvail += 2; } public uint Rand7() { uint selection; do { while (bitsAvail < 3) { var x = Rand5(); if (x < 4) { // put the two low order bits straight in AddTwoBitsEntropy(x); fifth.Add(false); } else { fifth.Add(true); } } // read 3 bits selection = (uint)((buffer & Mask)); bitsAvail -= 3; buffer >>= 3; if (selection == 7) eigth.Add(true); else eigth.Add(false); } while (selection == 7); return selection; } }
The number of bits added to the buffer per call to Rand5 is currently 4/5 * 2 so 1.6. If the 1/5 probability value is included that increases by 0.05 so 1.65 but see the comment in the code where I have had to disable this.
Bits consumed by call to Rand7 = 3 + 1/8 * (3 + 1/8 * (3 + 1/8 * (…
This is 3 + 3/8 + 3/64 + 3/512 … so approx 3.42
By extracting information from the sevens I reclaim 1/8*1/7 bits per call so about 0.018
This gives a net consumption 3.4 bits per call which means the ratio is 2.125 calls to Rand5 for every Rand7. The optimum should be 2.1.
I would imagine this approach is significantly slower than many of the other ones here unless the cost of the call to Rand5 is extremely expensive (say calling out to some external source of entropy).
in php
function rand1to7() { do { $output_value = 0; for ($i = 0; $i < 28; $i++) { $output_value += rand1to5(); } while ($output_value != 140); $output_value -= 12; return floor($output_value / 16); }
loops to produce a random number between 16 and 127, divides by sixteen to create a float between 1 and 7.9375, then rounds down to get an int between 1 and 7. if I am not mistaken, there is a 16/112 chance of getting any one of the 7 outcomes.
extern int r5(); int r7() { return ((r5() & 0x01) << 2 ) | ((r5() & 0x01) << 1 ) | (r5() & 0x01); }
The function you need is rand1_7() , I wrote rand1_5() so that you can test it and plot it.
import numpy def rand1_5(): return numpy.random.randint(5)+1 def rand1_7(): q = 0 for i in xrange(7): q+= rand1_5() return q%7 + 1
just scale your output from your first function
0) you have a number in range 1-5 1) subtract 1 to make it in range 0-4 2) multiply by (7-1)/(5-1) to make it in range 0-6 3) add 1 to increment the range: Now your result is in between 1-7