为了加速Python代码,应该用什么替换循环和嵌套if语句?

为了加速Python代码,应该用什么替换循环和嵌套if语句?,python,pandas,ipython,vectorization,list-comprehension,Python,Pandas,Ipython,Vectorization,List Comprehension,我怎样才能避免for循环和嵌套if语句,并且更具python风格 乍一看,这似乎是一个“请为我做我所有的工作”的问题。我可以向你保证,事实并非如此。我正在尝试学习一些真正的Python,并希望根据一个可复制的示例和预定义函数来发现加速代码的方法 我使用大量for循环和嵌套if语句计算金融市场中跟踪特定信号的收益。我已经做了几次尝试,但我在矢量化、理解或其他更具python风格的贸易工具方面一无所获。到目前为止,我还可以接受,但最终我开始感受到使用在规模上太慢的函数的痛苦 我有一个带有两个索引和一

我怎样才能避免for循环和嵌套if语句,并且更具python风格

乍一看,这似乎是一个“请为我做我所有的工作”的问题。我可以向你保证,事实并非如此。我正在尝试学习一些真正的Python,并希望根据一个可复制的示例和预定义函数来发现加速代码的方法

我使用大量for循环和嵌套if语句计算金融市场中跟踪特定信号的收益。我已经做了几次尝试,但我在矢量化、理解或其他更具python风格的贸易工具方面一无所获。到目前为止,我还可以接受,但最终我开始感受到使用在规模上太慢的函数的痛苦

我有一个带有两个索引和一个特定事件的数据帧。包含前两个代码段以逐步显示过程。我已经包括了一些预定义的设置和在最后的功能完整的事情

In[1]

# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)

Observations = 10

# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                  columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))

# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                         periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])    

# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0

print(df)
i = 0
for signals in df['Signal']:
    if i == 0: 
        # First signal is always zero
        df.ix[i,'Signal'] = 0
    else:
        # Signal is 1 if Indicator A is above a certain level
        if df.ix[i,'IndicatorA'] > 5:                
            df.ix[i,'Signal'] = 1
        else:
            # Signal is 1 if Indicator B is above a certain level
            # AND a certain event occurs                
            if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
                 df.ix[i,'Signal'] = 1
            else:
                df.ix[i,'Signal'] = 0          
    i = i + 1    

print(df['Signal'])
Out[1]

# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)

Observations = 10

# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                  columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))

# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                         periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])    

# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0

print(df)
i = 0
for signals in df['Signal']:
    if i == 0: 
        # First signal is always zero
        df.ix[i,'Signal'] = 0
    else:
        # Signal is 1 if Indicator A is above a certain level
        if df.ix[i,'IndicatorA'] > 5:                
            df.ix[i,'Signal'] = 1
        else:
            # Signal is 1 if Indicator B is above a certain level
            # AND a certain event occurs                
            if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
                 df.ix[i,'Signal'] = 1
            else:
                df.ix[i,'Signal'] = 0          
    i = i + 1    

print(df['Signal'])

数据框按日期编制索引。我所寻找的信号是由这些指标和事件的相互作用决定的。信号的计算方法如下(在上面的代码段上展开):

In[2]

# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)

Observations = 10

# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                  columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))

# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                         periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])    

# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0

print(df)
i = 0
for signals in df['Signal']:
    if i == 0: 
        # First signal is always zero
        df.ix[i,'Signal'] = 0
    else:
        # Signal is 1 if Indicator A is above a certain level
        if df.ix[i,'IndicatorA'] > 5:                
            df.ix[i,'Signal'] = 1
        else:
            # Signal is 1 if Indicator B is above a certain level
            # AND a certain event occurs                
            if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
                 df.ix[i,'Signal'] = 1
            else:
                df.ix[i,'Signal'] = 0          
    i = i + 1    

print(df['Signal'])
Out[2]

# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)

Observations = 10

# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                  columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))

# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                         periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])    

# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0

print(df)
i = 0
for signals in df['Signal']:
    if i == 0: 
        # First signal is always zero
        df.ix[i,'Signal'] = 0
    else:
        # Signal is 1 if Indicator A is above a certain level
        if df.ix[i,'IndicatorA'] > 5:                
            df.ix[i,'Signal'] = 1
        else:
            # Signal is 1 if Indicator B is above a certain level
            # AND a certain event occurs                
            if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
                 df.ix[i,'Signal'] = 1
            else:
                df.ix[i,'Signal'] = 0          
    i = i + 1    

print(df['Signal'])

下面是定义为函数的全部内容。请注意,该函数返回信号的平均值,而不是信号列本身。这样,当代码运行时,控制台就不会杂乱无章,我们可以使用ipython中的%time来测试代码的效率

# Settings
import numpy as np
import pandas as pd
import datetime

# The whole thing defined as a function

def fxSlow(Observations):

    np.random.seed(12345678)

    df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                        columns=['IndicatorA', 'IndicatorB'] )
    df['Event'] = np.random.randint(0,2,size=(Observations, 1))

    datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                periods=Observations).tolist()
    df['Signal'] = 0

    df['Dates'] = datelist
    df = df.set_index(['Dates'])

    i = 0
    for signals in df['Signal']:
        if i == 0: 
            # First signal is always zero
            df.ix[i,'Signal'] = 0
        else:
            # Signal is 1 if Indocator A is above a certain level
            if df.ix[i,'IndicatorA'] > 5:                
                df.ix[i,'Signal'] = 1
            else:
                # Signal is 1 if Indicator B is above a certain level
                # AND a certain event occurs                
                if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
                     df.ix[i,'Signal'] = 1
                else:
                    df.ix[i,'Signal'] = 0          
        i = i + 1    


    return np.mean(df['Signal'])
下面您可以看到使用不同观测值/数据帧大小运行函数的结果:

那么,我怎样才能变得更像蟒蛇呢

作为一个额外的问题,当我将观察数增加到100000时,是什么导致了错误


你能试试这样的吗

def fxSlow2(Observations):

    np.random.seed(12345678)

    df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                        columns=['IndicatorA', 'IndicatorB'] )
    df['Event'] = np.random.randint(0,2,size=(Observations, 1))

    datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                periods=Observations).tolist()
    df['Signal'] = 0

    df['Dates'] = datelist
    df = df.set_index(['Dates'])

    df['Signal'] = (np.where(df.IndicatorA > 5, 
          1, 
          np.where( (df.shift(-1).IndicatorB > 5) &(df.Event > 1), 
                    1, 
                    0)
          )
    )

    df.loc[df.index[0],'Signal'] = 0

    return np.mean(df['Signal'])
%时间fxSlow2(100)

墙壁时间:10毫秒

Out[208]:0.43

%时间fxSlow2(1000)

墙时间:15毫秒

Out[209]:0.414

%时间fxSlow2(10000)

墙时间:61毫秒


Out[210]:0.4058你能试试这样的吗

def fxSlow2(Observations):

    np.random.seed(12345678)

    df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                        columns=['IndicatorA', 'IndicatorB'] )
    df['Event'] = np.random.randint(0,2,size=(Observations, 1))

    datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                periods=Observations).tolist()
    df['Signal'] = 0

    df['Dates'] = datelist
    df = df.set_index(['Dates'])

    df['Signal'] = (np.where(df.IndicatorA > 5, 
          1, 
          np.where( (df.shift(-1).IndicatorB > 5) &(df.Event > 1), 
                    1, 
                    0)
          )
    )

    df.loc[df.index[0],'Signal'] = 0

    return np.mean(df['Signal'])
%时间fxSlow2(100)

墙壁时间:10毫秒

Out[208]:0.43

%时间fxSlow2(1000)

墙时间:15毫秒

Out[209]:0.414

%时间fxSlow2(10000)

墙时间:61毫秒


Out[210]:0.4058

“想讨论”对本网站来说不是一个真正合适的问题。感谢您的反馈!我真正需要的是改进的具体建议。编辑问题的这一部分会更好,还是这根本不是这个问题的正确位置?我已经解决了这一点。我个人认为这是可以的,而且比通常的讨论性问题更加集中,所以我不会投票结束。对于这个网站来说,“想讨论”不是一个真正合适的问题。谢谢你的反馈!我真正需要的是改进的具体建议。编辑问题的这一部分会更好,还是这根本不是这个问题的正确位置?我已经解决了这一点。我个人认为这没问题,而且比通常的讨论性问题更加集中,所以我不会投票决定现在就下班。你的建议看起来很棒!真的很期待测试它!回到工作岗位。这太棒了!你的建议真的会改善我的工作流程。现在我有相当多的代码要清理…远离工作。你的建议看起来很棒!真的很期待测试它!回到工作岗位。这太棒了!你的建议真的会改善我的工作流程。现在我有很多代码要清理。。。