Python 导致奇怪行为的速度测试。将一个实例中花费的时间乘以100，另一个实例中仅为10_Python_Performance_Unit Testing

Python 导致奇怪行为的速度测试。将一个实例中花费的时间乘以100，另一个实例中仅为10

python performance unit-testing

Python 导致奇怪行为的速度测试。将一个实例中花费的时间乘以100，另一个实例中仅为10,python,performance,unit-testing,Python,Performance,Unit Testing,我正在做一个速度测试，有三个功能，readFile，predict和test。测试只是预录（readFile）。然后，我用timeit模块运行了很多次当我将循环的数量增加10倍时，函数predict将花费大约100倍的时间，但是使用函数predict的函数测试只增加10倍下面是函数和测试 def readFile(filepath): tempDict = {} file = open(filepath,'rb') for line in file:

我正在做一个速度测试，有三个功能，readFile，predict和test。测试只是预录（readFile）。然后，我用timeit模块运行了很多次

当我将循环的数量增加10倍时，函数predict将花费大约100倍的时间，但是使用函数predict的函数测试只增加10倍

下面是函数和测试

def readFile(filepath):
    tempDict = {}
    file = open(filepath,'rb')
    for line in file:
        split = line.split('\t')
        tempDict[split[1]] = split[2]
    return tempDict

def prepDict(tempDict):
    for key in tempDict.keys():
        tempDict[key+'a'] = tempDict[key].upper()
        del tempDict[key]
    return tempDict

def test():
    prepDict(readFile('two.txt'))

if __name__=='__main__':
    from timeit import Timer
    t = Timer(lambda: readFile('two.txt'))
    print 'readFile(10000): ' + str(t.timeit(number=10000))

    tempDict = readFile('two.txt')
    t = Timer(lambda: prepDict(tempDict))
    print 'prepDict (10000): ' + str(t.timeit(number=10000))

    t = Timer(lambda: test())
    print 'prepDict(readFile) (10000): ' + str(t.timeit(number=10000))

    t = Timer(lambda: readFile('two.txt'))
    print 'readFile(100000): ' + str(t.timeit(number=100000))

    tempDict = readFile('two.txt')
    t = Timer(lambda: prepDict(tempDict))
    print 'prepDict (100000): ' + str(t.timeit(number=100000))

    t = Timer(lambda: test())
    print 'prepDict(readFile) (100000): ' + str(t.timeit(number=100000))

我得到的结果如下：

readFile(10000): 0.61602914474
prepDict (10000): 0.200615847469
prepDict(readFile) (10000): 0.609288647286
readFile(100000): 5.91858320729
prepDict (100000): 18.8842101717
prepDict(readFile) (100000): 6.45040039665

如果我多次运行它，我会得到类似的结果。为什么prepDict会增加约100倍，而prepDict（readFile）即使使用prepDict函数也只会增加10倍

two.txt是一个以表格分隔的文件，包含以下数据点：

Item    Title   Hello2
Item    Desc    Testing1232
Item    Release 2011-02-03

您对

predict

的调用不是在孤立的环境中发生的。每次调用

predict

都会修改

tempDict

——每次键都会变长一点。因此，在对

predict

进行10**5次调用后，

predict

中的键是相当大的字符串。如果在

predict

中放置一条print语句，您可以（大量）看到这一点：

def prepDict(tempDict):
    for key in tempDict.keys():
        tempDict[key+'a'] = tempDict[key].upper()
        del tempDict[key]
    print(tempDict)
    return tempDict

解决此问题的方法是确保每次调用

predict

——或者更一般地说，您正在计时的语句——都不会影响正在计时的下一个调用（或语句）。abarnert已经展示了解决方案：

predict（tempDict.copy（））

顺便说一下，您可以使用

for循环

来减少代码重复：

import timeit
import collections    

if __name__=='__main__':
    Ns = [10**4, 10**5]
    timing = collections.defaultdict(list)
    for N in Ns:
        timing['readFile'].append(timeit.timeit(
            "readFile('two.txt')",
            "from __main__ import readFile",
            number = N))
        timing['prepDict'].append(timeit.timeit(
            "prepDict(tempDict.copy())",
            "from __main__ import readFile, prepDict; tempDict = readFile('two.txt')",
            number = N))
        timing['test'].append(timeit.timeit(
            "test()",
            "from __main__ import test",
            number = N))

    print('{k:10}: {N[0]:7} {N[1]:7} {r}'.format(k='key', N=Ns, r='ratio'))
    for key, t in timing.iteritems():
        print('{k:10}: {t[0]:0.5f} {t[1]:0.5f} {r:>5.2f}'.format(k=key, t=t, r=t[1]/t[0]))

产生定时，例如

key       :   10000  100000 ratio
test      : 0.11320 1.12601  9.95
prepDict  : 0.01604 0.16167 10.08
readFile  : 0.08977 0.91053 10.14

您对

predict

的调用不是在孤立的环境中发生的。每次调用

predict

都会修改

tempDict

——每次键都会变长一点。因此，在对

predict

进行10**5次调用后，

predict

中的键是相当大的字符串。如果在

predict

中放置一条print语句，您可以（大量）看到这一点：

def prepDict(tempDict):
    for key in tempDict.keys():
        tempDict[key+'a'] = tempDict[key].upper()
        del tempDict[key]
    print(tempDict)
    return tempDict

解决此问题的方法是确保每次调用

predict

——或者更一般地说，您正在计时的语句——都不会影响正在计时的下一个调用（或语句）。abarnert已经展示了解决方案：

predict（tempDict.copy（））

顺便说一下，您可以使用

for循环

来减少代码重复：

import timeit
import collections    

if __name__=='__main__':
    Ns = [10**4, 10**5]
    timing = collections.defaultdict(list)
    for N in Ns:
        timing['readFile'].append(timeit.timeit(
            "readFile('two.txt')",
            "from __main__ import readFile",
            number = N))
        timing['prepDict'].append(timeit.timeit(
            "prepDict(tempDict.copy())",
            "from __main__ import readFile, prepDict; tempDict = readFile('two.txt')",
            number = N))
        timing['test'].append(timeit.timeit(
            "test()",
            "from __main__ import test",
            number = N))

    print('{k:10}: {N[0]:7} {N[1]:7} {r}'.format(k='key', N=Ns, r='ratio'))
    for key, t in timing.iteritems():
        print('{k:10}: {t[0]:0.5f} {t[1]:0.5f} {r:>5.2f}'.format(k=key, t=t, r=t[1]/t[0]))

产生定时，例如

key       :   10000  100000 ratio
test      : 0.11320 1.12601  9.95
prepDict  : 0.01604 0.16167 10.08
readFile  : 0.08977 0.91053 10.14

这里的问题是，

predict

函数会扩展输入。每次按顺序调用它时，它都有更多的数据要处理。数据呈线性增长，因此第10000次运行的时间大约是第一次运行的10000倍*

当您调用

test

时，它每次都会创建一个新的dict，因此时间是恒定的

通过每次将

predict

测试更改为在dict的新副本上运行，您可以非常容易地看到这一点：

t = Timer(lambda: prepDict(tempDict.copy()))

顺便说一句，您的

预录

实际上并不是随着

数字

呈指数增长**，而是二次增长。一般来说，当某些数据呈超线性增长时，您需要估计算法成本，您确实需要获得两个以上的数据点

*这并不完全正确，它只会在字符串和散列操作所花费的时间（线性增长）开始超过每一个其他操作（都是常量）所花费的时间后开始线性增长

**您在这里没有提到任何有关指数增长的内容，但实际上您提到了，因此您可能在实际问题中做出了相同的毫无根据的假设。

这里的问题是您的

predict

函数扩展了输入。每次按顺序调用它时，它都有更多的数据要处理。数据呈线性增长，因此第10000次运行的时间大约是第一次运行的10000倍*

当您调用

test

时，它每次都会创建一个新的dict，因此时间是恒定的

通过每次将

predict

测试更改为在dict的新副本上运行，您可以非常容易地看到这一点：

t = Timer(lambda: prepDict(tempDict.copy()))

顺便说一句，您的

预录

实际上并不是随着

数字

呈指数增长**，而是二次增长。一般来说，当某些数据呈超线性增长时，您需要估计算法成本，您确实需要获得两个以上的数据点

*这并不完全正确，它只会在字符串和散列操作所花费的时间（线性增长）开始超过每一个其他操作（都是常量）所花费的时间后开始线性增长

**你在这里没有提到任何关于指数增长的内容，但是在你的文章中，你提到了，所以你可能在你的实际问题中做出了同样的毫无根据的假设。

这是因为当你只测试

predict

时，你对

predict

的所有调用都重用了

tempDict

。由于

predict

在给定的字典中循环所有项，然后基本上只是将每个字符串键的长度增加一个，最终会得到一组非常长的键。由于字符串连接操作正在使用/重新创建越来越大的字符串，因此随着操作的进行，这将开始降低函数的速度

在

test

中没有问题，因为每次都要重新初始化字典。

发生这种情况是因为在测试

predict

时，您对

predict

的所有调用都重用了

tempDict

。由于

predict

在

test

中没有问题，因为您每次都会重新初始化字典。

哈哈，谢谢！愚蠢的错误。我在实际工作中遇到的问题