在Python中加速位串/位操作?

我使用Eratosthenes和Python 3.1的Sieve编写了一个素数生成器。 代码在ideone.com上以0.32秒正常和正常地运行,以生成高达1,000,000的素数。

# from bitstring import BitString def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) flags = [False, False] + [True] * (limit - 2) # flags = BitString(limit) # Step through all the odd numbers for i in range(3, limit, 2): if flags[i] is False: # if flags[i] is True: continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: for j in range(i*3, limit, i<<1): flags[j] = False # flags[j] = True 

问题是,当我尝试产生高达10亿的数字时,内存不足。

  flags = [False, False] + [True] * (limit - 2) MemoryError 

正如你可以想象的那样,分配10亿个布尔值(每个Python中有1个字节 4或者8个字节(见注释))实际上是不可行的,所以我研究了比特串 。 我想,每个标志使用1位会更有效率的内存。 但是,该程序的性能急剧下降 – 运行时间为24秒,素数高达1,000,000。 这可能是由于bitstring的内部实现。

您可以评论/取消注释这三行,看看我改变使用BitString,作为上面的代码片段。

我的问题是, 有没有一种方法来加快我的程序,有没有bittring?

编辑:请在发布之前亲自testing代码。 自然,我不能接受比现有代码慢的答案。

再次编辑:

我已经在我的机器上编译了基准列表。

你的版本有几个小的优化。 通过颠倒True和False的angular色,可以将“ if flags[i] is False: ”改为“ if flags[i]: ”。 而第二range语句的起始值可以是i*i而不是i*3 。 您的原始版本在我的系统上需要0.166秒。 随着这些变化,下面的版本在我的系统上需要0.156秒。

 def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) flags = [True, True] + [False] * (limit - 2) # Step through all the odd numbers for i in range(3, limit, 2): if flags[i]: continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: for j in range(i*i, limit, i<<1): flags[j] = True 

虽然这不会帮助你解决内存问题。

进入C扩展的世界,我使用了gmpy的开发版本。 (免责声明:我是其中一个维护者。)开发版本被称为gmpy2,并支持称为xmpz的可变整数。 使用gmpy2和下面的代码,我有一个0.140秒的运行时间。 运行时间为10亿次的限制是158秒。

 import gmpy2 def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) # Actual number is 2*bit_position + 1. oddnums = gmpy2.xmpz(1) current = 0 while True: current += 1 current = oddnums.bit_scan0(current) prime = 2 * current + 1 if prime > limit: break yield prime # Exclude further multiples of the current prime number if prime <= sub_limit: for j in range(2*current*(current+1), limit>>1, prime): oddnums.bit_set(j) 

推动优化,并牺牲清晰度,我得到0.107和123秒的运行时间与下面的代码:

 import gmpy2 def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) # Actual number is 2*bit_position + 1. oddnums = gmpy2.xmpz(1) f_set = oddnums.bit_set f_scan0 = oddnums.bit_scan0 current = 0 while True: current += 1 current = f_scan0(current) prime = 2 * current + 1 if prime > limit: break yield prime # Exclude further multiples of the current prime number if prime <= sub_limit: list(map(f_set,range(2*current*(current+1), limit>>1, prime))) 

编辑:基于这个练习,我修改了gmpy2来接受xmpz.bit_set(iterator) 。 使用以下代码,所有素数less于1,000,000,000的运行时间对于Python 2.7是56秒,对于Python 3.2是74秒。 (正如在评论中指出的, xrangerange更快。)

 import gmpy2 try: range = xrange except NameError: pass def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) oddnums = gmpy2.xmpz(1) f_scan0 = oddnums.bit_scan0 current = 0 while True: current += 1 current = f_scan0(current) prime = 2 * current + 1 if prime > limit: break yield prime if prime <= sub_limit: oddnums.bit_set(iter(range(2*current*(current+1), limit>>1, prime))) 

编辑#2:再试一次! 我修改了gmpy2来接受xmpz.bit_set(slice) 。 使用下面的代码,对于Python 2.7和Python 3.2,所有素数less于1,000,000,000的运行时间大约是40秒。

 from __future__ import print_function import time import gmpy2 def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) flags = gmpy2.xmpz(1) # pre-allocate the total length flags.bit_set((limit>>1)+1) f_scan0 = flags.bit_scan0 current = 0 while True: current += 1 current = f_scan0(current) prime = 2 * current + 1 if prime > limit: break yield prime if prime <= sub_limit: flags.bit_set(slice(2*current*(current+1), limit>>1, prime)) start = time.time() result = list(prime_numbers(1000000000)) print(time.time() - start) 

编辑#3:我已经更新了gmpy2,以正确支持在xmpz的位级切片。 性能没有变化,但有一个很好的API。 我已经做了一些调整,我已经到了37秒。 (请参阅编辑#4以更改gmpy2 2.0.0b1。)

 from __future__ import print_function import time import gmpy2 def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' sub_limit = int(limit**0.5) flags = gmpy2.xmpz(1) flags[(limit>>1)+1] = True f_scan0 = flags.bit_scan0 current = 0 prime = 2 while prime <= sub_limit: yield prime current += 1 current = f_scan0(current) prime = 2 * current + 1 flags[2*current*(current+1):limit>>1:prime] = True while prime <= limit: yield prime current += 1 current = f_scan0(current) prime = 2 * current + 1 start = time.time() result = list(prime_numbers(1000000000)) print(time.time() - start) 

编辑#4:我在gmpy2 2.0.0b1中做了一些改变,破坏了前面的例子。 gmpy2不再把True当成一个特殊的值来提供一个无限的1位来源。 应该使用-1来代替。

 from __future__ import print_function import time import gmpy2 def prime_numbers(limit=1000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' sub_limit = int(limit**0.5) flags = gmpy2.xmpz(1) flags[(limit>>1)+1] = 1 f_scan0 = flags.bit_scan0 current = 0 prime = 2 while prime <= sub_limit: yield prime current += 1 current = f_scan0(current) prime = 2 * current + 1 flags[2*current*(current+1):limit>>1:prime] = -1 while prime <= limit: yield prime current += 1 current = f_scan0(current) prime = 2 * current + 1 start = time.time() result = list(prime_numbers(1000000000)) print(time.time() - start) 

编辑#5:我对gmpy2 2.0.0b2做了一些改进。 您现在可以遍历所有已设置或清除的位。 运行时间提高了约30%。

 from __future__ import print_function import time import gmpy2 def sieve(limit=1000000): '''Returns a generator that yields the prime numbers up to limit.''' # Increment by 1 to account for the fact that slices do not include # the last index value but we do want to include the last value for # calculating a list of primes. sieve_limit = gmpy2.isqrt(limit) + 1 limit += 1 # Mark bit positions 0 and 1 as not prime. bitmap = gmpy2.xmpz(3) # Process 2 separately. This allows us to use p+p for the step size # when sieving the remaining primes. bitmap[4 : limit : 2] = -1 # Sieve the remaining primes. for p in bitmap.iter_clear(3, sieve_limit): bitmap[p*p : limit : p+p] = -1 return bitmap.iter_clear(2, limit) if __name__ == "__main__": start = time.time() result = list(sieve(1000000000)) print(time.time() - start) print(len(result)) 

好的,所以这是我的第二个答案,但速度是本质的,我认为我不得不提到bitarray模块 – 尽pipe它是bitstring的克星:)。 它非常适合这种情况,因为它不仅是一个C扩展(比纯Python有希望的速度更快),而且还支持片分配。 尽pipe如此,它还不适用于Python 3。

我甚至没有试图优化这个,我只是重写bitstring版本。 在我的机器上,我获得了一百万以下的素数0.16秒。

十亿分之一秒完成,2分31秒完成。

 import bitarray def prime_bitarray(limit=1000000): yield 2 flags = bitarray.bitarray(limit) flags.setall(False) sub_limit = int(limit**0.5) for i in xrange(3, limit, 2): if not flags[i]: yield i if i <= sub_limit: flags[3*i:limit:i*2] = True 

好吧,我今天晚上做了一个(接近完整的)综合基准testing,看看哪个代码运行速度最快。 希望有人会发现这个名单有用。 我省略了超过30秒的任何事情,在我的机器上完成。

我想感谢大家提出的意见。 我从你的努力中获得了很多的见解,我也希望你也有。

我的机器:AMD ZM-86,2.40 Ghz双核,配备4GB内存。 这是惠普TouchSmart Tx2笔记本电脑。 请注意,虽然我可能已经链接到一个pastebin, 我在我自己的机器上testing了以下所有内容。

一旦我能够构build它,我将添加gmpy2基准。

所有的基准testing在Python 2.6 x86

返回一个素数列表n(高达1,000,000:)( 使用 Python生成器)

塞巴斯蒂安的numpy生成器版本(更新) – 121毫秒@

马克的筛+车轮 – 154毫秒

罗伯特的版本与切片 – 159毫秒

我的改进版与切片 – 205毫秒

具有枚举的Numpy生成器 – 249 ms @

马克的基本筛 – 317毫秒

casevh改进了我原来的解决scheme – 343毫秒

我修改numpy发生器解决scheme – 407毫秒

我在问题中的原始方法 – 409毫秒

Bitarray解决scheme – 414毫秒@

纯Python与bytearray – 1394毫秒@

Scott的BitString解决scheme – 6659 ms @

'@'表示这种方法能够在我的机器设置上产生高达n <1,000,000,000的数据。

另外,如果你不需要发生器,只需要一次列出整个列表:

来自RosettaCode的numpy解决scheme – 32 ms @

(这个numpy的解决scheme也能产生高达10亿次,花了61.6259秒,我怀疑内存换了一次,所以是双倍的时间。)

相关的问题。
在python中列出N中所有的素数的最快方法

嗨,我也在python中寻找一个代码,以尽可能快地生成质量达到10 9,这是因为内存问题。 到现在我想出了这个产生质量高达10 6和10 * 7(我的旧机器分别为0.171s和1,764s),这似乎是稍快(至less在我的电脑)比“我的改进版本切片“,可能是因为我在其他代码中使用了n // ii +1(参见下文)而不是类似的len(flags [i2 :: i << 1])。 如果我错了,请纠正我。 任何改善build议都非常受欢迎。

 def primes(n): """ Returns a list of primes < n """ sieve = [True] * n for i in xrange(3,int(n**0.5)+1,2): if sieve[i]: sieve[i*i::2*i]=[False]*((ni*i-1)/(2*i)+1) return [2] + [i for i in xrange(3,n,2) if sieve[i]] 

在他的一个代码中,Xavier使用标志[i2 :: i << 1]和len(flags [i2 :: i << 1]),我用这个事实来计算大小, ni i)// 2 * i的倍数因为我们要计算i 我也总计1所以len(sieve [i i :: 2 * i])等于(ni * i)//(2 * i )+1这使得代码更快。 基于上述代码的基本生成器将是:

 def primesgen(n): """ Generates all primes <= n """ sieve = [True] * n yield 2 for i in xrange(3,int(n**0.5)+1,2): if sieve[i]: yield i sieve[i*i::2*i] = [False]*((ni*i-1)/(2*i)+1) for i in xrange(i+2,n,2): if sieve[i]: yield i 

稍微修改一下就可以编写一个稍微慢一点的代码,上面的代码以sieve = [True] *(n // 2)的一半的筛子开始,并且对于相同的n有效。 不知道你是如何反映在内存的问题。 作为一个实现的例子,这里是numpy rosetta代码(也许更快)的修改版本,从一半大小的筛子开始。

 import numpy def primesfrom3to(n): """ Returns a array of primes, 3 <= p < n """ sieve = numpy.ones(n/2, dtype=numpy.bool) for i in xrange(3,int(n**0.5)+1,2): if sieve[i/2]: sieve[i*i/2::i] = False return 2*numpy.nonzero(sieve)[0][1::]+1 

一个更快和更记忆的发电机将是:

 import numpy def primesgen1(n): """ Input n>=6, Generates all primes < n """ sieve1 = numpy.ones(n/6+1, dtype=numpy.bool) sieve5 = numpy.ones(n/6 , dtype=numpy.bool) sieve1[0] = False yield 2; yield 3; for i in xrange(int(n**0.5)/6+1): if sieve1[i]: k=6*i+1; yield k; sieve1[ ((k*k)/6) ::k] = False sieve5[(k*k+4*k)/6::k] = False if sieve5[i]: k=6*i+5; yield k; sieve1[ ((k*k)/6) ::k] = False sieve5[(k*k+2*k)/6::k] = False for i in xrange(i+1,n/6): if sieve1[i]: yield 6*i+1 if sieve5[i]: yield 6*i+5 if n%6 > 1: if sieve1[i+1]: yield 6*(i+1)+1 

或多一点的代码:

 import numpy def primesgen(n): """ Input n>=30, Generates all primes < n """ size = n/30 + 1 sieve01 = numpy.ones(size, dtype=numpy.bool) sieve07 = numpy.ones(size, dtype=numpy.bool) sieve11 = numpy.ones(size, dtype=numpy.bool) sieve13 = numpy.ones(size, dtype=numpy.bool) sieve17 = numpy.ones(size, dtype=numpy.bool) sieve19 = numpy.ones(size, dtype=numpy.bool) sieve23 = numpy.ones(size, dtype=numpy.bool) sieve29 = numpy.ones(size, dtype=numpy.bool) sieve01[0] = False yield 2; yield 3; yield 5; for i in xrange(int(n**0.5)/30+1): if sieve01[i]: k=30*i+1; yield k; sieve01[ (k*k)/30::k] = False sieve07[(k*k+ 6*k)/30::k] = False sieve11[(k*k+10*k)/30::k] = False sieve13[(k*k+12*k)/30::k] = False sieve17[(k*k+16*k)/30::k] = False sieve19[(k*k+18*k)/30::k] = False sieve23[(k*k+22*k)/30::k] = False sieve29[(k*k+28*k)/30::k] = False if sieve07[i]: k=30*i+7; yield k; sieve01[(k*k+ 6*k)/30::k] = False sieve07[(k*k+24*k)/30::k] = False sieve11[(k*k+16*k)/30::k] = False sieve13[(k*k+12*k)/30::k] = False sieve17[(k*k+ 4*k)/30::k] = False sieve19[ (k*k)/30::k] = False sieve23[(k*k+22*k)/30::k] = False sieve29[(k*k+10*k)/30::k] = False if sieve11[i]: k=30*i+11; yield k; sieve01[ (k*k)/30::k] = False sieve07[(k*k+ 6*k)/30::k] = False sieve11[(k*k+20*k)/30::k] = False sieve13[(k*k+12*k)/30::k] = False sieve17[(k*k+26*k)/30::k] = False sieve19[(k*k+18*k)/30::k] = False sieve23[(k*k+ 2*k)/30::k] = False sieve29[(k*k+ 8*k)/30::k] = False if sieve13[i]: k=30*i+13; yield k; sieve01[(k*k+24*k)/30::k] = False sieve07[(k*k+ 6*k)/30::k] = False sieve11[(k*k+ 4*k)/30::k] = False sieve13[(k*k+18*k)/30::k] = False sieve17[(k*k+16*k)/30::k] = False sieve19[ (k*k)/30::k] = False sieve23[(k*k+28*k)/30::k] = False sieve29[(k*k+10*k)/30::k] = False if sieve17[i]: k=30*i+17; yield k; sieve01[(k*k+ 6*k)/30::k] = False sieve07[(k*k+24*k)/30::k] = False sieve11[(k*k+26*k)/30::k] = False sieve13[(k*k+12*k)/30::k] = False sieve17[(k*k+14*k)/30::k] = False sieve19[ (k*k)/30::k] = False sieve23[(k*k+ 2*k)/30::k] = False sieve29[(k*k+20*k)/30::k] = False if sieve19[i]: k=30*i+19; yield k; sieve01[ (k*k)/30::k] = False sieve07[(k*k+24*k)/30::k] = False sieve11[(k*k+10*k)/30::k] = False sieve13[(k*k+18*k)/30::k] = False sieve17[(k*k+ 4*k)/30::k] = False sieve19[(k*k+12*k)/30::k] = False sieve23[(k*k+28*k)/30::k] = False sieve29[(k*k+22*k)/30::k] = False if sieve23[i]: k=30*i+23; yield k; sieve01[(k*k+24*k)/30::k] = False sieve07[(k*k+ 6*k)/30::k] = False sieve11[(k*k+14*k)/30::k] = False sieve13[(k*k+18*k)/30::k] = False sieve17[(k*k+26*k)/30::k] = False sieve19[ (k*k)/30::k] = False sieve23[(k*k+ 8*k)/30::k] = False sieve29[(k*k+20*k)/30::k] = False if sieve29[i]: k=30*i+29; yield k; sieve01[ (k*k)/30::k] = False sieve07[(k*k+24*k)/30::k] = False sieve11[(k*k+20*k)/30::k] = False sieve13[(k*k+18*k)/30::k] = False sieve17[(k*k+14*k)/30::k] = False sieve19[(k*k+12*k)/30::k] = False sieve23[(k*k+ 8*k)/30::k] = False sieve29[(k*k+ 2*k)/30::k] = False for i in xrange(i+1,n/30): if sieve01[i]: yield 30*i+1 if sieve07[i]: yield 30*i+7 if sieve11[i]: yield 30*i+11 if sieve13[i]: yield 30*i+13 if sieve17[i]: yield 30*i+17 if sieve19[i]: yield 30*i+19 if sieve23[i]: yield 30*i+23 if sieve29[i]: yield 30*i+29 i += 1 mod30 = n%30 if mod30 > 1: if sieve01[i]: yield 30*i+1 if mod30 > 7: if sieve07[i]: yield 30*i+7 if mod30 > 11: if sieve11[i]: yield 30*i+11 if mod30 > 13: if sieve13[i]: yield 30*i+13 if mod30 > 17: if sieve17[i]: yield 30*i+17 if mod30 > 19: if sieve19[i]: yield 30*i+19 if mod30 > 23: if sieve23[i]: yield 30*i+23 

Ps:如果您对如何加快此代码有任何build议,请从更改操作顺序到预计算任何内容,请发表评论。

这是我刚才写的一个版本。 与您的速度比较可能会很有趣。 它没有对空间问题做任何事情,但是(事实上,它们可能比你的版本更糟糕)。

 from math import sqrt def basicSieve(n): """Given a positive integer n, generate the primes < n.""" s = [1]*n for p in xrange(2, 1+int(sqrt(n-1))): if s[p]: a = p*p s[a::p] = [0] * -((an)//p) for p in xrange(2, n): if s[p]: yield p 

我有更快的版本,使用轮子,但他们更复杂。 原始来源是在这里 。

好的,这是使用轮子的版本。 wheelSieve是主要的入口点。

 from math import sqrt from bisect import bisect_left def basicSieve(n): """Given a positive integer n, generate the primes < n.""" s = [1]*n for p in xrange(2, 1+int(sqrt(n-1))): if s[p]: a = p*p s[a::p] = [0] * -((an)//p) for p in xrange(2, n): if s[p]: yield p class Wheel(object): """Class representing a wheel. Attributes: primelimit -> wheel covers primes < primelimit. For example, given a primelimit of 6 the wheel primes are 2, 3, and 5. primes -> list of primes less than primelimit size -> product of the primes in primes; the modulus of the wheel units -> list of units modulo size phi -> number of units """ def __init__(self, primelimit): self.primelimit = primelimit self.primes = list(basicSieve(primelimit)) # compute the size of the wheel size = 1 for p in self.primes: size *= p self.size = size # find the units by sieving units = [1] * self.size for p in self.primes: units[::p] = [0]*(self.size//p) self.units = [i for i in xrange(self.size) if units[i]] # number of units self.phi = len(self.units) def to_index(self, n): """Compute alpha(n), where alpha is an order preserving map from the set of units modulo self.size to the nonnegative integers. If n is not a unit, the index of the first unit greater than n is given.""" return bisect_left(self.units, n % self.size) + n // self.size * self.phi def from_index(self, i): """Inverse of to_index.""" return self.units[i % self.phi] + i // self.phi * self.size def wheelSieveInner(n, wheel): """Given a positive integer n and a wheel, find the wheel indices of all primes that are less than n, and that are units modulo the wheel modulus. """ # renaming to avoid the overhead of attribute lookups U = wheel.units wS = wheel.size # inverse of unit map UI = dict((u, i) for i, u in enumerate(U)) nU = len(wheel.units) sqroot = 1+int(sqrt(n-1)) # ceiling of square root of n # corresponding index (index of next unit up) sqrti = bisect_left(U, sqroot % wS) + sqroot//wS*nU upper = bisect_left(U, n % wS) + n//wS*nU ind2 = bisect_left(U, 2 % wS) + 2//wS*nU s = [1]*upper for i in xrange(ind2, sqrti): if s[i]: q = i//nU u = U[i%nU] p = q*wS+u u2 = u*u aq, au = (p+u)*q+u2//wS, u2%wS wp = p * nU for v in U: # eliminate entries corresponding to integers # congruent to p*v modulo p*wS uvr = u*v%wS m = aq + (au > uvr) bot = (m + (q*v + u*v//wS - m) % p) * nU + UI[uvr] s[bot::wp] = [0]*-((bot-upper)//wp) return s def wheelSieve(n, wheel=Wheel(10)): """Given a positive integer n, generate the list of primes <= n.""" n += 1 wS = wheel.size U = wheel.units nU = len(wheel.units) s = wheelSieveInner(n, wheel) return (wheel.primes[:bisect_left(wheel.primes, n)] + [p//nU*wS + U[p%nU] for p in xrange(bisect_left(U, 2 % wS) + 2//wS*nU, len(s)) if s[p]]) 

至于轮子是什么:好吧,你知道(除了2),所有的素数都是奇数,所以大部分的筛子都会漏掉所有的偶数。 同样的,你可以进一步注意到所有的素数(2和3除外)与1或者5模6(== 2 * 3)是一致的,所以你只能存储筛子中这些数字的条目。 下一步要注意的是,所有的素数(2,3和5除外)与1,7,11,13,17,19,23,29(模30)中的一个一致(这里是30 == 2 * 3 * 5)等等。

使用bitstring可以提高速度的一个方面是在排除当前数字的倍数时使用“set”方法。

所以重要的部分变成了

 for i in range(3, limit, 2): if flags[i]: yield i if i <= sub_limit: flags.set(1, range(i*3, limit, i*2)) 

在我的机器上,这个速度比原来快了3倍。

我的另一个想法是使用bitstring来表示只有奇数。 然后你可以使用flags.findall([0])find未设置的位,它返回一个发生器。 不知道这是否会更快(查找位模式不是非常容易做到有效)。

[全面披露:我写了bitstring模块,所以我在这里有一些自豪感!]


作为一个比较,我也采取了bitstring方法的胆量,以便它以相同的方式做,但没有范围检查等我认为这给了一个合理的下限为纯数十亿元素的Python(没有改变algorithm,我认为这是作弊!)

 def prime_pure(limit=1000000): yield 2 flags = bytearray((limit + 7) // 8) sub_limit = int(limit**0.5) for i in xrange(3, limit, 2): byte, bit = divmod(i, 8) if not flags[byte] & (128 >> bit): yield i if i <= sub_limit: for j in xrange(i*3, limit, i*2): byte, bit = divmod(j, 8) flags[byte] |= (128 >> bit) 

在我的机器上,这对于一百万个元素运行约0.62秒,这意味着它大约是四分之一的比特数答案的速度。 我认为这对纯Python来说是相当合理的。

Python的整数types可以是任意大小,所以你不应该需要一个聪明的比特串库来表示一组比特,只是一个数字。

这是代码。 它轻松处理了1,000,000的限制,甚至可以处理10,000,000而不会抱怨太多:

 def multiples_of(n, step, limit): bits = 1 << n old_bits = bits max = 1 << limit while old_bits < max: old_bits = bits bits += bits << step step *= 2 return old_bits def prime_numbers(limit=10000000): '''Prime number generator. Yields the series 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ... using Sieve of Eratosthenes. ''' yield 2 sub_limit = int(limit**0.5) flags = ((1 << (limit - 2)) - 1) << 2 # Step through all the odd numbers for i in xrange(3, limit, 2): if not (flags & (1 << i)): continue yield i # Exclude further multiples of the current prime number if i <= sub_limit: flags &= ~multiples_of(i * 3, i << 1, limit) 

multiples_of函数避免了单独设置每个倍数的成本。 它设置最初的位,将它移动到初始步( i << 1 )并将结果添加到自身。 然后它会加倍这个步骤,以便下一次移动复制两个位,等等直到达到极限。 这将数字的所有倍数设置为O(log(限制))时间的限制。

One option you may want to look at is just compiling a C/C++ module so you have direct access to the bit-twiddling features. AFAIK you could write something of that nature and only return the results on completion of the calculations performed in C/C++. Now that I'm typing this you may also look at numpy which does store arrays as native C for speed. I don't know if that will be any faster than the bitstring module, though!

Another really stupid option, but that can be of help if you really need a large list of primes number available very fast. Say, if you need them as a tool to answer project Euler's problems (if the problem is just optimizing your code as a mind game it's irrelevant).

Use any slow solution to generate list and save it to a python source file, says primes.py that would just contain:

 primes = [ a list of a million primes numbers here ] 

Now to use them you just have to do import primes and you get them with minimal memory footprint at the speed of disk IO. Complexity is number of primes 🙂

Even if you used a poorly optimized solution to generate this list, it will only be done once and it does not matter much.

You could probably make it even faster using pickle/unpickle, but you get the idea…

You could use a segmented Sieve of Eratosthenes. The memory used for each segment is reused for the next segment.

Here is some Python3 code that uses less memory than the bitarray/bitstring solutions posted to date and about 1/8 the memory of Robert William Hanks's primesgen(), while running faster than primesgen() (marginally faster at 1,000,000, using 37KB of memory, 3x faster than primesgen() at 1,000,000,000 using under 34MB). Increasing the chunk size (variable chunk in the code) uses more memory but speeds up the program, up to a limit — I picked a value so that its contribution to memory is under about 10% of sieve's n//30 bytes. It's not as memory-efficient as Will Ness's infinite generator ( https://stackoverflow.com/a/19391111/5439078 ) because it records (and returns at the end, in compressed form) all calculated primes.

This should work correctly as long as the square root calculation comes out accurate (about 2**51 if Python uses 64-bit doubles). However you should not use this program to find primes that large!

(I didn't recalculate the offsets, just took them from code of Robert William Hanks. Thanks Robert!)

 import numpy as np def prime_30_gen(n): """ Input n, int-like. Yields all primes < n as Python ints""" wheel = np.array([2,3,5]) yield from wheel[wheel < n].tolist() powers = 1 << np.arange(8, dtype='u1') odds = np.array([1, 7, 11, 13, 17, 19, 23, 29], dtype='i8') offsets=np.array([[0,6,10,12,16,18,22,28],[6,24,16,12,4,0,22,10], [0,6,20,12,26,18,2,8], [24,6,4,18,16,0,28,10], [6,24,26,12,14,0,2,20], [0,24,10,18,4,12,28,22], [24,6,14,18,26,0,8,20], [0,24,20,18,14,12,8,2]], dtype='i8') offsets = {d:f for d,f in zip(odds, offsets)} sieve = 255 * np.ones((n + 28) // 30, dtype='u1') if n % 30 > 1: sieve[-1] >>= 8 - odds.searchsorted(n % 30) sieve[0] &= ~1 for i in range((int((n - 1)**0.5) + 29) // 30): for odd in odds[(sieve[i] & powers).nonzero()[0]]: k = i * 30 + odd yield int(k) for clear_bit, off in zip(~powers, offsets[odd]): sieve[(k * (k + off)) // 30 :: k] &= clear_bit chunk = min(1 + (n >> 13), 1<<15) for i in range(i + 1, len(sieve), chunk): a = (sieve[i : i + chunk, np.newaxis] & powers) a = np.flatnonzero(a.astype('bool')) + i * 8 a = (a // 8 * 30 + odds[a & 7]).tolist() yield from a return sieve 

Side note: If you want real speed and have the 2GB of RAM required for the list of primes to 10**9, then you should use pyprimesieve (on https://pypi.python.org/ , using primesieve http://primesieve.org/ ).