为了加速Python代码,应该用什么替换循环和嵌套if语句?
我怎样才能避免for循环和嵌套if语句,并且更具python风格 乍一看,这似乎是一个“请为我做我所有的工作”的问题。我可以向你保证,事实并非如此。我正在尝试学习一些真正的Python,并希望根据一个可复制的示例和预定义函数来发现加速代码的方法 我使用大量for循环和嵌套if语句计算金融市场中跟踪特定信号的收益。我已经做了几次尝试,但我在矢量化、理解或其他更具python风格的贸易工具方面一无所获。到目前为止,我还可以接受,但最终我开始感受到使用在规模上太慢的函数的痛苦 我有一个带有两个索引和一个特定事件的数据帧。包含前两个代码段以逐步显示过程。我已经包括了一些预定义的设置和在最后的功能完整的事情 In[1]为了加速Python代码,应该用什么替换循环和嵌套if语句?,python,pandas,ipython,vectorization,list-comprehension,Python,Pandas,Ipython,Vectorization,List Comprehension,我怎样才能避免for循环和嵌套if语句,并且更具python风格 乍一看,这似乎是一个“请为我做我所有的工作”的问题。我可以向你保证,事实并非如此。我正在尝试学习一些真正的Python,并希望根据一个可复制的示例和预定义函数来发现加速代码的方法 我使用大量for循环和嵌套if语句计算金融市场中跟踪特定信号的收益。我已经做了几次尝试,但我在矢量化、理解或其他更具python风格的贸易工具方面一无所获。到目前为止,我还可以接受,但最终我开始感受到使用在规模上太慢的函数的痛苦 我有一个带有两个索引和一
# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)
Observations = 10
# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))
# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0
print(df)
i = 0
for signals in df['Signal']:
if i == 0:
# First signal is always zero
df.ix[i,'Signal'] = 0
else:
# Signal is 1 if Indicator A is above a certain level
if df.ix[i,'IndicatorA'] > 5:
df.ix[i,'Signal'] = 1
else:
# Signal is 1 if Indicator B is above a certain level
# AND a certain event occurs
if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
df.ix[i,'Signal'] = 1
else:
df.ix[i,'Signal'] = 0
i = i + 1
print(df['Signal'])
Out[1]
# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)
Observations = 10
# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))
# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0
print(df)
i = 0
for signals in df['Signal']:
if i == 0:
# First signal is always zero
df.ix[i,'Signal'] = 0
else:
# Signal is 1 if Indicator A is above a certain level
if df.ix[i,'IndicatorA'] > 5:
df.ix[i,'Signal'] = 1
else:
# Signal is 1 if Indicator B is above a certain level
# AND a certain event occurs
if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
df.ix[i,'Signal'] = 1
else:
df.ix[i,'Signal'] = 0
i = i + 1
print(df['Signal'])
数据框按日期编制索引。我所寻找的信号是由这些指标和事件的相互作用决定的。信号的计算方法如下(在上面的代码段上展开):
In[2]
# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)
Observations = 10
# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))
# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0
print(df)
i = 0
for signals in df['Signal']:
if i == 0:
# First signal is always zero
df.ix[i,'Signal'] = 0
else:
# Signal is 1 if Indicator A is above a certain level
if df.ix[i,'IndicatorA'] > 5:
df.ix[i,'Signal'] = 1
else:
# Signal is 1 if Indicator B is above a certain level
# AND a certain event occurs
if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
df.ix[i,'Signal'] = 1
else:
df.ix[i,'Signal'] = 0
i = i + 1
print(df['Signal'])
Out[2]
# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)
Observations = 10
# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))
# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0
print(df)
i = 0
for signals in df['Signal']:
if i == 0:
# First signal is always zero
df.ix[i,'Signal'] = 0
else:
# Signal is 1 if Indicator A is above a certain level
if df.ix[i,'IndicatorA'] > 5:
df.ix[i,'Signal'] = 1
else:
# Signal is 1 if Indicator B is above a certain level
# AND a certain event occurs
if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
df.ix[i,'Signal'] = 1
else:
df.ix[i,'Signal'] = 0
i = i + 1
print(df['Signal'])
下面是定义为函数的全部内容。请注意,该函数返回信号的平均值,而不是信号列本身。这样,当代码运行时,控制台就不会杂乱无章,我们可以使用ipython中的%time来测试代码的效率
# Settings
import numpy as np
import pandas as pd
import datetime
# The whole thing defined as a function
def fxSlow(Observations):
np.random.seed(12345678)
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Signal'] = 0
df['Dates'] = datelist
df = df.set_index(['Dates'])
i = 0
for signals in df['Signal']:
if i == 0:
# First signal is always zero
df.ix[i,'Signal'] = 0
else:
# Signal is 1 if Indocator A is above a certain level
if df.ix[i,'IndicatorA'] > 5:
df.ix[i,'Signal'] = 1
else:
# Signal is 1 if Indicator B is above a certain level
# AND a certain event occurs
if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
df.ix[i,'Signal'] = 1
else:
df.ix[i,'Signal'] = 0
i = i + 1
return np.mean(df['Signal'])
下面您可以看到使用不同观测值/数据帧大小运行函数的结果:
那么,我怎样才能变得更像蟒蛇呢
作为一个额外的问题,当我将观察数增加到100000时,是什么导致了错误
你能试试这样的吗
def fxSlow2(Observations):
np.random.seed(12345678)
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Signal'] = 0
df['Dates'] = datelist
df = df.set_index(['Dates'])
df['Signal'] = (np.where(df.IndicatorA > 5,
1,
np.where( (df.shift(-1).IndicatorB > 5) &(df.Event > 1),
1,
0)
)
)
df.loc[df.index[0],'Signal'] = 0
return np.mean(df['Signal'])
%时间fxSlow2(100)
墙壁时间:10毫秒
Out[208]:0.43
%时间fxSlow2(1000)
墙时间:15毫秒
Out[209]:0.414
%时间fxSlow2(10000)
墙时间:61毫秒
Out[210]:0.4058你能试试这样的吗
def fxSlow2(Observations):
np.random.seed(12345678)
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Signal'] = 0
df['Dates'] = datelist
df = df.set_index(['Dates'])
df['Signal'] = (np.where(df.IndicatorA > 5,
1,
np.where( (df.shift(-1).IndicatorB > 5) &(df.Event > 1),
1,
0)
)
)
df.loc[df.index[0],'Signal'] = 0
return np.mean(df['Signal'])
%时间fxSlow2(100)
墙壁时间:10毫秒
Out[208]:0.43
%时间fxSlow2(1000)
墙时间:15毫秒
Out[209]:0.414
%时间fxSlow2(10000)
墙时间:61毫秒
Out[210]:0.4058“想讨论”对本网站来说不是一个真正合适的问题。感谢您的反馈!我真正需要的是改进的具体建议。编辑问题的这一部分会更好,还是这根本不是这个问题的正确位置?我已经解决了这一点。我个人认为这是可以的,而且比通常的讨论性问题更加集中,所以我不会投票结束。对于这个网站来说,“想讨论”不是一个真正合适的问题。谢谢你的反馈!我真正需要的是改进的具体建议。编辑问题的这一部分会更好,还是这根本不是这个问题的正确位置?我已经解决了这一点。我个人认为这没问题,而且比通常的讨论性问题更加集中,所以我不会投票决定现在就下班。你的建议看起来很棒!真的很期待测试它!回到工作岗位。这太棒了!你的建议真的会改善我的工作流程。现在我有相当多的代码要清理…远离工作。你的建议看起来很棒!真的很期待测试它!回到工作岗位。这太棒了!你的建议真的会改善我的工作流程。现在我有很多代码要清理。。。