Python 3.x Eratosthenes实现的筛选和比较

Python 3.x Eratosthenes实现的筛选和比较,python-3.x,algorithm,time-complexity,primes,sieve-of-eratosthenes,Python 3.x,Algorithm,Time Complexity,Primes,Sieve Of Eratosthenes,除了时间复杂度为O(N log N)的Eratosthenes筛的简单实现外,我还尝试实现时间复杂度为O(N)的修改。虽然这两种方法都能产生预期的效果,但不知何故,前一种方法比下一种方法花费的时间要少得多,我也不知道为什么。我真的很想在这方面给点建议 实施1: def build_sieve_eratosthenes(num): ## Creates sieve of size (num+1) to correct for 0-indexing. primes_sieve = [

除了时间复杂度为O(N log N)的Eratosthenes筛的简单实现外,我还尝试实现时间复杂度为O(N)的修改。虽然这两种方法都能产生预期的效果,但不知何故,前一种方法比下一种方法花费的时间要少得多,我也不知道为什么。我真的很想在这方面给点建议

实施1:

def build_sieve_eratosthenes(num):
    ## Creates sieve of size (num+1) to correct for 0-indexing.
    primes_sieve = [1] * (num+1)
    primes_sieve[0] = primes_sieve[1] = 0
    for p in range(2, num):
        if primes_sieve[p] == 1:
            for mul in range(2*p, num+1, p):
                primes_sieve[mul] = 0
    return primes_sieve
实施2:

def build_sieve_eratosthenes_linear(num):
    ## Creates sieve of size (num+1) to correct for 0-indexing.
    primes_sieve = [1] * (num+1)
    primes_sieve[0] = primes_sieve[1] = 0

    ## Builds a list of size (num+1) recording the smallest prime factor of each number.
    SPF = [1] * (num+1)

    ## Builds a list of all primes seen so far with pos indicator of position where to insert the next prime.
    ## Uses a large fixed memory allocation scheme to avoid the usage of append list operation.
    primes = [0] * num
    pos = 0

    for p in range(2, num+1):
        if primes_sieve[p] == 1:
            primes[pos] = p
            pos = pos + 1
            ## Smallest prime factor of a prime is a prime itself.
            SPF[p] = p
        for i in range(0, pos):
            if p * primes[i] <= num and primes[i] <= SPF[p]:
                primes_sieve[p*primes[i]] = 0
                SPF[p * primes[i]] = primes[i]
            else:
                break
    return primes_sieve

test_num = 2000000
结果:

start2 = time.time()
sum_2 = find_sum_of_primes_upto_num_with_sieve(build_sieve_eratosthenes, test_num)
end2 = time.time()
print("Sum of primes obtained: ", sum_2)
print("Time taken by checking primality of each number is %f sec" % (end2 - start2))
获得的素数之和:142913828922
检查每个数字的素性所用的时间为0.647822秒

start3 = time.time()
sum_3 = find_sum_of_primes_upto_num_with_sieve(build_sieve_eratosthenes_linear, test_num)
end3 = time.time()
print("Sum of primes obtained: ", sum_3)
print("Time taken by checking primality of each number is %f sec" % (end3 - start3))
获得的素数之和:142913828922

检查每个数字的素性所用的时间是1.561308秒

我在每个例程中加入了一个简单的迭代计数器,并从10^3到10^7运行10次幂

建造筛网,筛网:

    1000 has     1958 iterations in sieve
   10000 has    23071 iterations in sieve
  100000 has   256808 iterations in sieve
 1000000 has  2775210 iterations in sieve
10000000 has 29465738 iterations in sieve
构建筛网,并将其线性化:

    1000 has      831 iterations in sieve_linear
   10000 has     8770 iterations in sieve_linear
  100000 has    90407 iterations in sieve_linear
 1000000 has   921501 iterations in sieve_linear
10000000 has  9335420 iterations in sieve_linear
您的
linear
函数不是线性的:请注意,运行
pos
次的内部循环。。。而
pos
是找到的素数数量的计数,它不是一个常数

linear
的增长速度比“normal”函数慢,总体迭代次数明显减少。然而,每次迭代都有更大的成本,这就是为什么您会看到“反转”时间的原因。在
线性
函数中,找到的每个数字和每个“删除”都会更加昂贵;缓慢的增长还没有赶上你的2*10^6的极限,而不是我的10*7的极限。如果你觉得值得的话,你可以推断出大约一天的时间来更好地把握合适的时机。。。但核心“问题”是每个数字的处理速度较慢

关于细节,以下是完整的输出:

1000 has 1958 iterations in sieve
Sum of primes obtained:  76127
Time taken by checking primality of each number is 0.000904 sec
10000 has 23071 iterations in sieve
Sum of primes obtained:  5736396
Time taken by checking primality of each number is 0.008270 sec
100000 has 256808 iterations in sieve
Sum of primes obtained:  454396537
Time taken by checking primality of each number is 0.067962 sec
1000000 has 2775210 iterations in sieve
Sum of primes obtained:  37550402023
Time taken by checking primality of each number is 0.428727 sec
10000000 has 29465738 iterations in sieve
Sum of primes obtained:  3203324994356
Time taken by checking primality of each number is 5.761439 sec
1000 has 831 iterations in sieve_linear
Sum of primes obtained:  76127
Time taken by checking primality of each number is 0.001069 sec
10000 has 8770 iterations in sieve_linear
Sum of primes obtained:  5736396
Time taken by checking primality of each number is 0.010398 sec
100000 has 90407 iterations in sieve_linear
Sum of primes obtained:  454396537
Time taken by checking primality of each number is 0.107276 sec
1000000 has 921501 iterations in sieve_linear
Sum of primes obtained:  37550402023
Time taken by checking primality of each number is 1.087080 sec
10000000 has 9335420 iterations in sieve_linear
Sum of primes obtained:  3203324994356
Time taken by checking primality of each number is 11.008726 sec

我在每个例程中加入了一个简单的迭代计数器,并从10^3到10^7运行10次幂

建造筛网,筛网:

    1000 has     1958 iterations in sieve
   10000 has    23071 iterations in sieve
  100000 has   256808 iterations in sieve
 1000000 has  2775210 iterations in sieve
10000000 has 29465738 iterations in sieve
构建筛网,并将其线性化:

    1000 has      831 iterations in sieve_linear
   10000 has     8770 iterations in sieve_linear
  100000 has    90407 iterations in sieve_linear
 1000000 has   921501 iterations in sieve_linear
10000000 has  9335420 iterations in sieve_linear
您的
linear
函数不是线性的:请注意,运行
pos
次的内部循环。。。而
pos
是找到的素数数量的计数,它不是一个常数

linear
的增长速度比“normal”函数慢,总体迭代次数明显减少。然而,每次迭代都有更大的成本,这就是为什么您会看到“反转”时间的原因。在
线性
函数中,找到的每个数字和每个“删除”都会更加昂贵;缓慢的增长还没有赶上你的2*10^6的极限,而不是我的10*7的极限。如果你觉得值得的话,你可以推断出大约一天的时间来更好地把握合适的时机。。。但核心“问题”是每个数字的处理速度较慢

关于细节,以下是完整的输出:

1000 has 1958 iterations in sieve
Sum of primes obtained:  76127
Time taken by checking primality of each number is 0.000904 sec
10000 has 23071 iterations in sieve
Sum of primes obtained:  5736396
Time taken by checking primality of each number is 0.008270 sec
100000 has 256808 iterations in sieve
Sum of primes obtained:  454396537
Time taken by checking primality of each number is 0.067962 sec
1000000 has 2775210 iterations in sieve
Sum of primes obtained:  37550402023
Time taken by checking primality of each number is 0.428727 sec
10000000 has 29465738 iterations in sieve
Sum of primes obtained:  3203324994356
Time taken by checking primality of each number is 5.761439 sec
1000 has 831 iterations in sieve_linear
Sum of primes obtained:  76127
Time taken by checking primality of each number is 0.001069 sec
10000 has 8770 iterations in sieve_linear
Sum of primes obtained:  5736396
Time taken by checking primality of each number is 0.010398 sec
100000 has 90407 iterations in sieve_linear
Sum of primes obtained:  454396537
Time taken by checking primality of each number is 0.107276 sec
1000000 has 921501 iterations in sieve_linear
Sum of primes obtained:  37550402023
Time taken by checking primality of each number is 1.087080 sec
10000000 has 9335420 iterations in sieve_linear
Sum of primes obtained:  3203324994356
Time taken by checking primality of each number is 11.008726 sec

你可以考虑用算法格式写你的想法,我不知道Python,但是我对素数相关的作品有一些兴趣。你的实现看起来不像O(n),你有一个内循环。请参考这个:为了理解为什么它是O(n),我看到了,作者声称它是O(n),但它不是肯定的。它需要更多的内存(至少是Eratosthenes筛的4倍),这会导致更多的延迟。什么叫“获得的素数之和:142913828922”?素数的总数?你在这个代码中发现的最高的素数是什么?你可以考虑用算法格式写你的想法,我不知道Python,但是我对素数相关的作品有一些兴趣。你的实现看起来不像O(n),你有一个内循环。请参考这个:为了理解为什么是O(n),我看到了,作者声称它是O(n)。但这不是绝对的。它需要更多的内存(至少是Eratosthenes筛的4倍),这会导致更多的延迟。什么叫“获得的素数之和:142913828922”?素数的总数?您在这段代码中找到的最高质数是多少?非常感谢您提出的这个简单但有洞察力的想法,即添加一个计数器并记录每次迭代的费用。另外,我喜欢您关于接受该限制的建议。但这里的一个限制是,我还构建了内存分配达到该限制的数组。而且可能无法以不断增加的限制分配内存。首先,请注意,您只需要查找小于或等于
N
的素数,即可识别小于或等于
N^2
的素数。第二,如果你需要识别大素数——超出你分配筛子的能力——那么你需要研究这样做的方法。筛子并不是找到超过某一点的素数的有效方法。这一点取决于您的硬件以及您愿意等待多长时间。根据应用程序的不同,我在10^8和10^15之间切换。我还维护我生成的素数文件,因此我可以在中阅读这些文件以供一般使用,或引导新的应用程序。非常感谢添加计数器并记录每次迭代的费用这一简单而有见地的想法。此外,我喜欢你关于接受限制的建议。但这里的一个限制是,我还构建了内存分配达到该限制的数组。而且可能无法以不断增加的限制分配内存。首先,请注意,您只需要查找小于或等于
N
的素数,即可识别小于或等于
N^2
的素数。第二,如果你需要识别大素数——超出你分配筛子的能力——那么你需要研究这样做的方法。筛子并不是找到超过某一点的素数的有效方法。这一点取决于您的硬件以及您愿意等待多长时间。根据应用程序的不同,我在10^8和10^15之间切换。我还维护我生成的素数文件,所以我可以在中读取这些文件以供一般使用,或者引导一个新的应用程序。