Python 3.x 如何通过替换pandas中的for循环来减少程序执行时间
我有12000个csv文件,每个文件有6000行。我在代码中使用for循环,我认为正因为如此,我的代码执行时间增加了。如果有人知道如何将这段代码更改到pandas包中以减少执行时间Python 3.x 如何通过替换pandas中的for循环来减少程序执行时间,python-3.x,pandas,Python 3.x,Pandas,我有12000个csv文件,每个文件有6000行。我在代码中使用for循环,我认为正因为如此,我的代码执行时间增加了。如果有人知道如何将这段代码更改到pandas包中以减少执行时间 for i in range(len(df)): if ((df['EOG_Start_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_start_farm'].values[i]-df['EOG_Min_Farm'].valu
for i in range(len(df)):
if ((df['EOG_Start_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_start_farm'].values[i]-df['EOG_Min_Farm'].values[i])) &((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
#print('EOG')
df['EOG_flag'].values[i]=1
if ((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
#print('gust')
df['Gust_flag'].values[i]=1
范围内的i(len(df)):
如果((df['EOG\u Start\u model'].values[i]-df['EOG\u Min\u model'].values[i])您可以单独使用矢量化解决方案-craete布尔掩码,通过和链接在一起,并在以下位置设置值:
x=df['EOG\u start\u farm']。值df['EOG\u Min\u farm']。值
m1=(df['EOG\U Start\U model']值df['EOG\U Min\U model']值)小于x
m2=(df['EOG_最大_模型]]值df['EOG_最小_模型]]值)小于x
m3=df['Avg']。数值>2
m23=m2和m3
df['EOG_标志']=np.式中(m1&m2&m3,1,df['EOG_标志'].值)
df['Gust_flag']=np.式中(m2和m3,1,df['Gust_flag']值)
性能:
np.random.seed(2019)
N = 6000
c = ['EOG_Start_model','EOG_Min_model','EOG_start_farm','EOG_Min_Farm','EOG_Max_model',
'EOG_Max_Farm','Avg','EOG_flag','Gust_flag']
df = pd.DataFrame(np.random.rand(N, 9), columns=c)
print (df)
In [91]: %%timeit
...: x = df['EOG_start_farm'].values-df['EOG_Min_Farm'].values
...: m1 = (df['EOG_Start_model'].values-df['EOG_Min_model'].values) < x
...: m2 = (df['EOG_Max_model'].values-df['EOG_Min_model'].values) < x
...: m3 = df['Avg'].values > 2
...: m23 = m2 & m3
...:
...: df['EOG_flag'] = np.where(m1 & m2 & m3, 1, df['EOG_flag'].values)
...: df['Gust_flag'] = np.where(m2 & m3, 1, df['Gust_flag'].values)
...:
597 µs ± 6.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [93]: %%timeit
...: for i in range(len(df)):
...: if ((df['EOG_Start_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_start_farm'].values[i]-df['EOG_Min_Farm'].values[i])) &((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
...: #print('EOG')
...: df['EOG_flag'].values[i]=1
...:
...: if ((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
...: #print('gust')
...: df['Gust_flag'].values[i]=1
231 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.random.seed(2019年)
N=6000
c=[‘EOG_开始_模型’、‘EOG_最小_模型’、‘EOG_开始_农场’、‘EOG_最小_农场’、‘EOG_最大_模型’,
‘EOG_Max_Farm’、‘Avg’、‘EOG_flag’、‘Gust_flag’]
df=pd.DataFrame(np.random.rand(N,9),columns=c)
打印(df)
在[91]中:%%timeit
…:x=df['EOG\u start\u farm']。值df['EOG\u Min\u farm']。值
…:m1=(df['EOG_Start_model']值df['EOG_Min_model']值)2
…:m23=m2和m3
...:
…:df['EOG_flag']=np.式中(m1&m2&m3,1,df['EOG_flag']值)
…:df['Gust_flag']=np.式中(m2和m3,1,df['Gust_flag']值)
...:
每个回路597µs±6.3µs(7次运行的平均值±标准偏差,每个1000个回路)
在[93]中:%%timeit
…:对于范围内的i(len(df)):
…:如果((df['EOG\u Start\u model'].values[i]-df['EOG\u Min\u model'].values[i])此代码与for循环具有相同的结果,则没有明显的执行时间,执行几乎相同time@Nickel-好的,添加一些测试。@Nickel-它比原始溶液快387倍。
np.random.seed(2019)
N = 6000
c = ['EOG_Start_model','EOG_Min_model','EOG_start_farm','EOG_Min_Farm','EOG_Max_model',
'EOG_Max_Farm','Avg','EOG_flag','Gust_flag']
df = pd.DataFrame(np.random.rand(N, 9), columns=c)
print (df)
In [91]: %%timeit
...: x = df['EOG_start_farm'].values-df['EOG_Min_Farm'].values
...: m1 = (df['EOG_Start_model'].values-df['EOG_Min_model'].values) < x
...: m2 = (df['EOG_Max_model'].values-df['EOG_Min_model'].values) < x
...: m3 = df['Avg'].values > 2
...: m23 = m2 & m3
...:
...: df['EOG_flag'] = np.where(m1 & m2 & m3, 1, df['EOG_flag'].values)
...: df['Gust_flag'] = np.where(m2 & m3, 1, df['Gust_flag'].values)
...:
597 µs ± 6.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [93]: %%timeit
...: for i in range(len(df)):
...: if ((df['EOG_Start_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_start_farm'].values[i]-df['EOG_Min_Farm'].values[i])) &((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
...: #print('EOG')
...: df['EOG_flag'].values[i]=1
...:
...: if ((df['EOG_Max_model'].values[i]-df['EOG_Min_model'].values[i])<(df['EOG_Max_Farm'].values[i]-df['EOG_Min_Farm'].values[i]))&((df['Avg'].values[i]>2)):
...: #print('gust')
...: df['Gust_flag'].values[i]=1
231 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)