Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 如何通过替换pandas中的for循环来减少程序执行时间_Python 3.x_Pandas - Fatal编程技术网

Python 3.x 如何通过替换pandas中的for循环来减少程序执行时间

Python 3.x 如何通过替换pandas中的for循环来减少程序执行时间,python-3.x,pandas,Python 3.x,Pandas,我有12000个csv文件,每个文件有6000行。我在代码中使用for循环,我认为正因为如此,我的代码执行时间增加了。如果有人知道如何将这段代码更改到pandas包中以减少执行时间 for i in range(len(df)): if ((df['EOG_Start_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_start_farm'].values[i]-df['EOG_Min_Farm'].valu

我有12000个csv文件,每个文件有6000行。我在代码中使用for循环,我认为正因为如此,我的代码执行时间增加了。如果有人知道如何将这段代码更改到pandas包中以减少执行时间

for i in range(len(df)):
        if ((df['EOG_Start_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_start_farm'].values[i]-df['EOG_Min_Farm'].values[i])) &((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
          #print('EOG')
          df['EOG_flag'].values[i]=1

        if ((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
            #print('gust')
            df['Gust_flag'].values[i]=1
范围内的i(len(df)):

如果((df['EOG\u Start\u model'].values[i]-df['EOG\u Min\u model'].values[i])您可以单独使用矢量化解决方案-craete布尔掩码,通过
链接在一起,并在以下位置设置值:

x=df['EOG\u start\u farm']。值df['EOG\u Min\u farm']。值
m1=(df['EOG\U Start\U model']值df['EOG\U Min\U model']值)小于x
m2=(df['EOG_最大_模型]]值df['EOG_最小_模型]]值)小于x
m3=df['Avg']。数值>2
m23=m2和m3
df['EOG_标志']=np.式中(m1&m2&m3,1,df['EOG_标志'].值)
df['Gust_flag']=np.式中(m2和m3,1,df['Gust_flag']值)
性能

np.random.seed(2019)

N = 6000
c = ['EOG_Start_model','EOG_Min_model','EOG_start_farm','EOG_Min_Farm','EOG_Max_model',
     'EOG_Max_Farm','Avg','EOG_flag','Gust_flag']
df = pd.DataFrame(np.random.rand(N, 9), columns=c)
print (df)

In [91]: %%timeit
    ...: x = df['EOG_start_farm'].values-df['EOG_Min_Farm'].values
    ...: m1 = (df['EOG_Start_model'].values-df['EOG_Min_model'].values) < x
    ...: m2 = (df['EOG_Max_model'].values-df['EOG_Min_model'].values) < x
    ...: m3 = df['Avg'].values > 2
    ...: m23 = m2 & m3
    ...: 
    ...: df['EOG_flag'] = np.where(m1 & m2 & m3, 1, df['EOG_flag'].values)
    ...: df['Gust_flag'] = np.where(m2 & m3, 1, df['Gust_flag'].values)
    ...: 
597 µs ± 6.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [93]: %%timeit
    ...: for i in range(len(df)):
    ...:     if ((df['EOG_Start_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_start_farm'].values[i]-df['EOG_Min_Farm'].values[i])) &((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
    ...:           #print('EOG')
    ...:           df['EOG_flag'].values[i]=1
    ...: 
    ...:     if ((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
    ...:             #print('gust')
    ...:             df['Gust_flag'].values[i]=1
231 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.random.seed(2019年)
N=6000
c=[‘EOG_开始_模型’、‘EOG_最小_模型’、‘EOG_开始_农场’、‘EOG_最小_农场’、‘EOG_最大_模型’,
‘EOG_Max_Farm’、‘Avg’、‘EOG_flag’、‘Gust_flag’]
df=pd.DataFrame(np.random.rand(N,9),columns=c)
打印(df)
在[91]中:%%timeit
…:x=df['EOG\u start\u farm']。值df['EOG\u Min\u farm']。值
…:m1=(df['EOG_Start_model']值df['EOG_Min_model']值)2
…:m23=m2和m3
...: 
…:df['EOG_flag']=np.式中(m1&m2&m3,1,df['EOG_flag']值)
…:df['Gust_flag']=np.式中(m2和m3,1,df['Gust_flag']值)
...: 
每个回路597µs±6.3µs(7次运行的平均值±标准偏差,每个1000个回路)
在[93]中:%%timeit
…:对于范围内的i(len(df)):

…:如果((df['EOG\u Start\u model'].values[i]-df['EOG\u Min\u model'].values[i])此代码与for循环具有相同的结果,则没有明显的执行时间,执行几乎相同time@Nickel-好的,添加一些测试。@Nickel-它比原始溶液快387倍。
np.random.seed(2019)

N = 6000
c = ['EOG_Start_model','EOG_Min_model','EOG_start_farm','EOG_Min_Farm','EOG_Max_model',
     'EOG_Max_Farm','Avg','EOG_flag','Gust_flag']
df = pd.DataFrame(np.random.rand(N, 9), columns=c)
print (df)

In [91]: %%timeit
    ...: x = df['EOG_start_farm'].values-df['EOG_Min_Farm'].values
    ...: m1 = (df['EOG_Start_model'].values-df['EOG_Min_model'].values) < x
    ...: m2 = (df['EOG_Max_model'].values-df['EOG_Min_model'].values) < x
    ...: m3 = df['Avg'].values > 2
    ...: m23 = m2 & m3
    ...: 
    ...: df['EOG_flag'] = np.where(m1 & m2 & m3, 1, df['EOG_flag'].values)
    ...: df['Gust_flag'] = np.where(m2 & m3, 1, df['Gust_flag'].values)
    ...: 
597 µs ± 6.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [93]: %%timeit
    ...: for i in range(len(df)):
    ...:     if ((df['EOG_Start_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_start_farm'].values[i]-df['EOG_Min_Farm'].values[i])) &((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
    ...:           #print('EOG')
    ...:           df['EOG_flag'].values[i]=1
    ...: 
    ...:     if ((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
    ...:             #print('gust')
    ...:             df['Gust_flag'].values[i]=1
231 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)