Python 为什么带列表理解语句的函数比列表理解语句快?

Python 为什么带列表理解语句的函数比列表理解语句快?,python,performance,function,list-comprehension,Python,Performance,Function,List Comprehension,我最近遇到过这种行为,我有点困惑它为什么会发生——我最初的假设是,在调用函数时,而不是在运行语句时,正在进行某种优化 例如: 让我们从一个简单的例子开始: somestring="climate change is a big problem. However emissions are still rising" sometopics=["climate","change","problem","big&

我最近遇到过这种行为,我有点困惑它为什么会发生——我最初的假设是,在调用函数时,而不是在运行语句时,正在进行某种优化

例如: 让我们从一个简单的例子开始:

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]
假设我们有一个字符串列表,类似于上面的“somestring”,我们也有一个主题列表,比如somestopics

我们想比较“somestring”中是否存在任何“sometopics”,重要的是,将这些“sometopics”返回到一个新列表

使用列表理解语句,我们可以对一个字符串执行以下操作:

result = [element for element in sometopic if(element in somestring)]
然而,在我的机器上,下面的函数定义比上面的语句快20-30%

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result
为什么会发生这种情况

函数总是比等价语句/语句列表更快吗

编辑****

请参见以下最低可行可复制笔记本示例:

import pandas as pd, numpy as np

columns_df = pd.DataFrame({"Keyword":['fish soup','katsu','soup']}) # Compute a Pandas dataframe to write into 500kcolumns
somestring="pad thai is a good recipe. It is cooked with chicken or lamb or beef"
sometopics=["chicken","pad thai","recipe","lamb","beef"]
print(len(sometopics))
somebigtopics=sometopics*100000


def extractsubstrings(inputstring,alistofpossibletopics):
    #obvious very slow for loop
    topicslist=[]
    print(inputstring)
    for topic in alistofpossibletopics:
        if str(topic) in inputstring:
            topicslist.append(str(topic))

%%time
def listcompinlists(mystring,bigtopic):
    res = [ele for ele in bigtopic if(ele in mystring)] 
    return res

%%time
res = [ele for ele in somebigtopics if(ele in somestring)] 

%%time
x=extractsubstrings(somestring,somebigtopics)

%%time
funcres=listcompinlists(somestring,somebigtopics)
在我的机器上(Ubuntu18.04,Python3.6),对于上述情况,列表理解在22-24毫秒内执行,而函数在18-21毫秒内执行。这并不是很大的区别,但如果你有1000万行要处理,例如,这就节省了相当多的时间

TLDR Performance comparison:

extractsubstrings=Wall time: 122 ms
list comprehension statement: Wall time: 24.5 ms
listcompinlists=Wall time: 18.6 ms

我无法复制你所说的。你能提供证明你主张的任何测量数据吗

我创建此度量是为了比较执行时间:

import time

N = 1000000

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result
   
somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]

start = time.time()
for _ in range(N):
    result = [element for element in sometopics if(element in somestring)]
end = time.time()
print(f'Time using list comprehension: {end - start}')
   
start = time.time()
for _ in range(N):
    result = comparelistoftopicstokw(somestring, sometopics)
end = time.time()
print(f'Time using function: {end - start}')
输出
因此,在我的例子中,列表理解的平均速度更快。

我无法回答你的问题,但我做了一个小测试,对其基础提出了质疑

正如我们可以从输出中推断的那样,结果是非常随机的,在某些情况下,一个平均比另一个快,而另一个则相反

import time
import statistics

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]


def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result

for i in range(10):
    print(f"Average time to execute 1 iteration (100000 iterations). Round {i + 1}:")
    time1average = []
    for i in range(100000):
        start1 = time.time()
        result = [element for element in sometopics if(element in somestring)]
        time1average.append(time.time() - start1)
        
    print(statistics.mean(time1average))
    
    time2average = []
    for i in range(100000):
        start2 = time.time()
        comparelistoftopicstokw(somestring,sometopics)
        time2average.append(time.time() - start2)
    
    print(statistics.mean(time2average))
    print("")
输出:

Average time to execute 1 iteration (100000 iterations). Round 1:
3.879823684692383e-06
5.041525363922119e-06

Average time to execute 1 iteration (100000 iterations). Round 2:
4.478754997253418e-06
5.097501277923584e-06

Average time to execute 1 iteration (100000 iterations). Round 3:
3.9185094833374025e-06
4.177823066711426e-06

Average time to execute 1 iteration (100000 iterations). Round 4:
4.212841987609863e-06
4.6886253356933596e-06

Average time to execute 1 iteration (100000 iterations). Round 5:
3.580739498138428e-06
3.840360641479492e-06

Average time to execute 1 iteration (100000 iterations). Round 6:
3.070487976074219e-06
4.423313140869141e-06

Average time to execute 1 iteration (100000 iterations). Round 7:
3.0085206031799318e-06
3.401658535003662e-06

Average time to execute 1 iteration (100000 iterations). Round 8:
2.937157154083252e-06
4.46035623550415e-06

Average time to execute 1 iteration (100000 iterations). Round 9:
3.5696911811828613e-06
3.5602593421936035e-06

Average time to execute 1 iteration (100000 iterations). Round 10:
2.7422666549682615e-06
3.158261775970459e-06

在哪些环境中,您如何度量执行时间?在不知道你的方法和你得到的实际数字的情况下,这是无法回答的。这是否回答了你的问题?全局变量的访问速度比局部变量慢,“mystring”对于每个“元素”都检索一次——这可能是您的情况。但是如果不看任何数字就很难判断。请看上面的最小示例,这是在python 3.6环境中,通过anaconda navigator,在linux机器上(ubuntu 18.04)
Average time to execute 1 iteration (100000 iterations). Round 1:
3.879823684692383e-06
5.041525363922119e-06

Average time to execute 1 iteration (100000 iterations). Round 2:
4.478754997253418e-06
5.097501277923584e-06

Average time to execute 1 iteration (100000 iterations). Round 3:
3.9185094833374025e-06
4.177823066711426e-06

Average time to execute 1 iteration (100000 iterations). Round 4:
4.212841987609863e-06
4.6886253356933596e-06

Average time to execute 1 iteration (100000 iterations). Round 5:
3.580739498138428e-06
3.840360641479492e-06

Average time to execute 1 iteration (100000 iterations). Round 6:
3.070487976074219e-06
4.423313140869141e-06

Average time to execute 1 iteration (100000 iterations). Round 7:
3.0085206031799318e-06
3.401658535003662e-06

Average time to execute 1 iteration (100000 iterations). Round 8:
2.937157154083252e-06
4.46035623550415e-06

Average time to execute 1 iteration (100000 iterations). Round 9:
3.5696911811828613e-06
3.5602593421936035e-06

Average time to execute 1 iteration (100000 iterations). Round 10:
2.7422666549682615e-06
3.158261775970459e-06