Python:查找除数的性能
对这些功能进行基准测试:Python:查找除数的性能,python,performance-testing,python-3.6,execution-time,Python,Performance Testing,Python 3.6,Execution Time,对这些功能进行基准测试: def divisors_optimized(number): square_root = int(math.sqrt(number)) for divisor in range(1, square_root): if number % divisor == 0: yield divisor yield number / divisor if square_root ** 2 =
def divisors_optimized(number):
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
yield divisor
yield number / divisor
if square_root ** 2 == number:
yield square_root
def number_of_divisors_optimized(number):
count = 0
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
count += 2
if square_root ** 2 == number:
count += 1
return count
您可以看到,两者的基本结构是相同的
基准代码:
number = 9999999
for i in range(10):
print(f"iteration {i}:")
start = time.time()
result = list(utils.divisors_optimized(number))
end = time.time()
print(f'len(divisors_optimized) took {end - start} seconds and found {len(result)} divisors.')
start = time.time()
result = utils.number_of_divisors_optimized(number)
end = time.time()
print(f'number_of_divisors_optimized took {end - start} seconds and found {result} divisors.')
print()
输出:
iteration 0:
len(divisors_optimized) took 0.00019598007202148438 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001919269561767578 seconds and found 12 divisors.
iteration 1:
len(divisors_optimized) took 0.00019121170043945312 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020599365234375 seconds and found 12 divisors.
iteration 2:
len(divisors_optimized) took 0.000179290771484375 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00019049644470214844 seconds and found 12 divisors.
iteration 3:
len(divisors_optimized) took 0.00019025802612304688 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.
iteration 4:
len(divisors_optimized) took 0.0001785755157470703 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.
iteration 5:
len(divisors_optimized) took 0.00022721290588378906 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.
iteration 6:
len(divisors_optimized) took 0.0001919269561767578 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00018930435180664062 seconds and found 12 divisors.
iteration 7:
len(divisors_optimized) took 0.00017881393432617188 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.
iteration 8:
len(divisors_optimized) took 0.00017976760864257812 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001785755157470703 seconds and found 12 divisors.
iteration 9:
len(divisors_optimized) took 0.00024819374084472656 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020766258239746094 seconds and found 12 divisors.
您可以看到,执行时间非常接近,每次都有利于这两种情况
有人能给我解释一下,为什么从生成器中创建一个列表并检索它的长度几乎和迭代时的计数一样快?我的意思是,内存分配(list()
)不应该比分配昂贵得多吗
我使用的是Python 3.6.3。您测试的内容远远多于您生产的内容。“发现因素”情况下发电机运行的int
成本与list
成本相比,与总工作量相比相形见绌。您正在执行3000多个审判庭;十二个yield
s与十二个加法相比,这是对那种工作的愚蠢改变。将加法/yield
s替换为pass
(不做任何操作),您会发现它仍在(大致)相同的时间内运行:
def ignore_divisors_optimized(number):
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
pass
if square_root ** 2 == number:
pass
使用ipython的%timeit
magic进行微基准标记:
>>> %timeit -r5 number_of_divisors_optimized(9999999)
266 µs ± 1.85 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 list(divisors_optimized(9999999))
267 µs ± 1.29 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 ignore_divisors_optimized(9999999)
267 µs ± 1.43 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
事实上,除数的数量
快了一微秒并不相关(重复测试的抖动高于一微秒);它们的速度基本相同,因为99%以上的工作是循环和测试,而不是测试通过时所做的工作
这是90/10优化规则的一个例子:大约90%的时间花在10%的代码上(在本例中,是试用版本身);10%用于代码的其他90%。您正在优化90%的代码中的一小部分,该代码在10%的时间内运行,但这没有帮助,因为绝大多数时间都花在if number%divisior==0:
行上。如果删除该测试,而只是在范围内循环,则在我的本地微基准中,运行时将下降到约78µs,这意味着该测试占用了近200µs的运行时,超过了所有其他代码加在一起所需的两倍
如果您想对此进行优化,您需要研究一些方法,这些方法要么加快试除法行本身(这基本上意味着使用不同的Python解释器,或者使用Cython将其编译为C),要么减少该行的运行次数(例如,预先计算可能的素数因子直到某个界限,因此对于任何给定的输入,您可以避免测试非素数因子,然后根据已知的素数因子及其多重性生成/计算非素数因子的数量).您正在测试的东西远远多于您正在生产的东西。int
的成本与发电机操作的列表
的“找到的因素”的成本与总共完成的工作量相比,案例就相形见绌了。您正在执行3000多个审判庭;12个收益率
s与12个附加值对此类工作来说是一个巨大的变化。将附加值
收益率s替换为通过
(不做任何事情),您会发现它仍然运行(大致)相同的时间:
def ignore_divisors_optimized(number):
square_root = int(math.sqrt(number))
for divisor in range(1, square_root):
if number % divisor == 0:
pass
if square_root ** 2 == number:
pass
使用ipython的%timeit
magic进行微基准标记:
>>> %timeit -r5 number_of_divisors_optimized(9999999)
266 µs ± 1.85 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 list(divisors_optimized(9999999))
267 µs ± 1.29 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 ignore_divisors_optimized(9999999)
267 µs ± 1.43 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
事实上,除数的数量
快了一微秒并不相关(重复测试的抖动高于一微秒);它们基本上都是相同的速度,因为>99%的工作是循环和测试,而不是测试通过时所做的
这是90/10优化规则的一个例子:大约90%的时间花在10%的代码上(在本例中,是试用版本身);10%花在其他90%的代码上。您正在优化90%的代码中的一小部分,该部分代码在10%的时间内运行,但这没有帮助,因为绝大多数时间都花在if number%divisior==0:
行上。如果您删除该测试,而只是在范围内循环,则运行时间在我的本地微基准测试中,e下降到约78µs,这意味着测试占用了近200µs的运行时间,是所有其他代码的两倍多
如果您想对此进行优化,您需要研究一些方法,这些方法要么加快试除法行本身(这基本上意味着使用不同的Python解释器,或者使用Cython将其编译为C),要么减少该行的运行次数(例如,预先计算可能的素数因子直到某个界限,因此对于任何给定的输入,您可以避免测试非素数因子,然后根据已知的素数因子及其多重性生成/计算非素数因子的数量).创建列表并不一定要慢。事实上,它甚至可能更快。但列表使用内存,这是人们在可能的情况下避免使用它们的主要原因。来自静态类型、主要是编译语言的背景,您可能倾向于认为内存分配是昂贵的。相对于JITless的所有其他开销,dynamic CPython,内存分配只是杯水车薪。创建列表并不一定慢。事实上,它甚至可能更快。但列表使用内存,这是人们在可能的情况下避免使用内存的主要原因。来自静态类型、主要是编译语言的背景,您可能倾向于认为内存分配是昂贵的。R与无抖动、动态CPython的所有其他开销相比,内存分配只是杯水车薪。