Python中模块导入的优化_Python_Premature Optimization

Python中模块导入的优化

python

Python中模块导入的优化,python,premature-optimization,Python,Premature Optimization,我正在阅读David Beazley的Python参考书，他提出了一个观点：例如，如果您正在执行很多平方根运算，是的使用“从数学导入sqrt”更快和“sqrt（x）”而不是键入 'math.sqrt（x）' 以及：用于涉及大量使用的计算对于方法或模块查找，它是几乎总是更好地消除通过放置要执行的操作进入首先是局部变量我决定尝试一下：第一（）第二（）结果是： 2.15461492538 1.39850616455 像这样的优化对我来说可能并不重要。但我很好奇，为什么比兹

我正在阅读David Beazley的Python参考书，他提出了一个观点：

例如，如果您正在执行很多平方根运算，是的使用“从数学导入sqrt”更快和“sqrt（x）”而不是键入 'math.sqrt（x）'

以及：

用于涉及大量使用的计算对于方法或模块查找，它是几乎总是更好地消除通过放置要执行的操作进入首先是局部变量

我决定尝试一下：

第一（）

第二（）

结果是：

2.15461492538
1.39850616455

像这样的优化对我来说可能并不重要。但我很好奇，为什么比兹利所写的与之相反的东西会被证明是真的。请注意，有一个1秒的差异，这是重要的，因为任务是琐碎的

为什么会这样

更新：

我得到的时间安排如下：

print timeit('first()', 'from __main__ import first');
print timeit('second()', 'from __main__ import second');

我猜，您的测试是有偏差的，第二个实现从第一个已经加载模块，或者只是从最近加载模块中获益

你试了几次？您是否切换了订单等。

first（）

不会保存任何内容，因为仍必须访问模块才能导入名称

另外，您没有给出计时方法，但给定函数名，似乎

first（）

执行初始导入，由于必须编译和执行模块，因此总是比后续导入的时间长。

来自集合的

导入defaultdict

和

导入集合

应该在迭代计时循环之外，因为您不会重复执行它们

我猜

from

语法必须比

import

语法做更多的工作

使用此测试代码：

#!/usr/bin/env python

import timeit

from collections import defaultdict
import collections

def first():
    from collections import defaultdict
    x = defaultdict(list)

def firstwithout():
    x = defaultdict(list)

def second():
    import collections
    x = collections.defaultdict(list)

def secondwithout():
    x = collections.defaultdict(list)

print "first with import",timeit.timeit('first()', 'from __main__ import first');
print "second with import",timeit.timeit('second()', 'from __main__ import second');

print "first without import",timeit.timeit('firstwithout()', 'from __main__ import firstwithout');
print "second without import",timeit.timeit('secondwithout()', 'from __main__ import secondwithout');

我得到的结果是：

first with import 1.61359190941
second with import 1.02904295921
first without import 0.344709157944
second without import 0.449721097946

这显示了重复导入的成本。

我还将得到

第一（.）

和

第二（.）

之间相似的比率，唯一的区别是计时是微秒级的

我认为你的计时没有任何用处。尝试找出更好的测试用例

更新：
FWIW，这里有一些测试来支持David Beazley的观点

import math
from math import sqrt

def first(n= 1000):
    for k in xrange(n):
        x= math.sqrt(9)

def second(n= 1000):
    for k in xrange(n):
        x= sqrt(9)

In []: %timeit first()
1000 loops, best of 3: 266 us per loop
In [: %timeit second()
1000 loops, best of 3: 221 us per loop
In []: 266./ 221
Out[]: 1.2036199095022624

因此，

first（）

比

second（）

像往常一样编写代码，导入模块并引用其模块和常量作为

module.attribute

。然后，使用以下

bind_all_modules

函数为函数添加前缀或绑定程序中的所有模块：

def bind_all_modules():
    from sys import modules
    from types import ModuleType
    for name, module in modules.iteritems():
        if isinstance(module, ModuleType):
            bind_all(module)

def bind_all(mc, builtin_only=False, stoplist=[],  verbose=False):
    """Recursively apply constant binding to functions in a module or class.

    Use as the last line of the module (after everything is defined, but
    before test code).  In modules that need modifiable globals, set
    builtin_only to True.

    """
    try:
        d = vars(mc)
    except TypeError:
        return
    for k, v in d.items():
        if type(v) is FunctionType:
            newv = _make_constants(v, builtin_only, stoplist,  verbose)
            try: setattr(mc, k, newv)
            except AttributeError: pass
        elif type(v) in (type, ClassType):
            bind_all(v, builtin_only, stoplist, verbose)

还有阅读/理解源代码的效率问题。下面是一个真实的示例（来自a的代码）

原件：

import math

def midpoint(p1, p2):
   lat1, lat2 = math.radians(p1[0]), math.radians(p2[0])
   lon1, lon2 = math.radians(p1[1]), math.radians(p2[1])
   dlon = lon2 - lon1
   dx = math.cos(lat2) * math.cos(dlon)
   dy = math.cos(lat2) * math.sin(dlon)
   lat3 = math.atan2(math.sin(lat1) + math.sin(lat2), math.sqrt((math.cos(lat1) + dx) * (math.cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + math.atan2(dy, math.cos(lat1) + dx)
   return(math.degrees(lat3), math.degrees(lon3))

备选方案：

from math import radians, degrees, sin, cos, atan2, sqrt

def midpoint(p1, p2):
   lat1, lat2 = radians(p1[0]), radians(p2[0])
   lon1, lon2 = radians(p1[1]), radians(p2[1])
   dlon = lon2 - lon1
   dx = cos(lat2) * cos(dlon)
   dy = cos(lat2) * sin(dlon)
   lat3 = atan2(sin(lat1) + sin(lat2), sqrt((cos(lat1) + dx) * (cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + atan2(dy, cos(lat1) + dx)
   return(degrees(lat3), degrees(lon3))

你是如何得到计时的？你是如何测量这些时间的？我已经更新了计时方法。@David:没有。我为什么要这么做？没有比这更重要的了。除timeit导入timeit的

之外

。这个节目还有什么？非常欢迎您在您的计算机上试用。因为在这里，我们经常会遇到忽略了关键细节的基准测试问题，因为OP认为这些问题并不重要。如果我将它们放在外部，这将破坏我问题的全部目的！啊，我明白你的意思了。@A：你似乎误解了比兹利写的东西。他不是在谈论重复导入的差异。因为重复进口而争论时间的不同是没有意义的。如果你把

import

s移到循环之外，他说的是真的，尽管在这个小例子中，

first

只稍微快一点。@Ned:Hmm，好的。那么首选哪种样式？@A：根据Python样式指南（PEP 8），对于

import

s，无论是

import x

还是

from x import y

，通常都可以。正如他所写，在处理大量循环调用时，可能有理由选择后者。有些人不喜欢后者，因为可能会混淆名称空间。人们普遍反对的是x导入的

。要点是你不想无缘无故地重复导入。正如PEP 8所说：“导入总是放在文件的顶部，就在任何模块注释和docstring之后，模块全局变量和常量之前。”@Ignacio：我尝试了你所说的，但没有。第二种方法更快。我尝试将

second（）

放在

first（）

上面，以便让它加载模块。因此，我让

second（）

执行初始导入。是的，不。您必须首先调用

second（）

，因为在实际调用之前不会发生任何事情。@Ignacio:我隔离了调用，结果仍然是一样的。“隔离了调用”？这到底是什么意思？我已经测试过了，顺序无关紧要。你能告诉我你是如何测试的吗？

import math

def midpoint(p1, p2):
   lat1, lat2 = math.radians(p1[0]), math.radians(p2[0])
   lon1, lon2 = math.radians(p1[1]), math.radians(p2[1])
   dlon = lon2 - lon1
   dx = math.cos(lat2) * math.cos(dlon)
   dy = math.cos(lat2) * math.sin(dlon)
   lat3 = math.atan2(math.sin(lat1) + math.sin(lat2), math.sqrt((math.cos(lat1) + dx) * (math.cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + math.atan2(dy, math.cos(lat1) + dx)
   return(math.degrees(lat3), math.degrees(lon3))

from math import radians, degrees, sin, cos, atan2, sqrt

def midpoint(p1, p2):
   lat1, lat2 = radians(p1[0]), radians(p2[0])
   lon1, lon2 = radians(p1[1]), radians(p2[1])
   dlon = lon2 - lon1
   dx = cos(lat2) * cos(dlon)
   dy = cos(lat2) * sin(dlon)
   lat3 = atan2(sin(lat1) + sin(lat2), sqrt((cos(lat1) + dx) * (cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + atan2(dy, cos(lat1) + dx)
   return(degrees(lat3), degrees(lon3))