Pandas 如何以矢量化的方式得到满足一定条件的行索引?

Pandas 如何以矢量化的方式得到满足一定条件的行索引?,pandas,Pandas,我有一个包含市场价格和订单信息的timeseries数据框。对于每个条目,都有相应的止损。我想找出数据帧中每个条目顺序的stoploss触发条索引。如果市场价格>=止损,则会触发止损,我想记录止损属于哪个入市订单。每个条目都根据其条目条索引进行记录。例如,第1栏输入价格为99的订单记录为输入订单1。第2栏的进场价98为进场单2,第5栏的进场价103为进场单5等 原始数据帧如下所示: entry price index entryprice stoploss 0 0

我有一个包含市场价格和订单信息的timeseries数据框。对于每个条目,都有相应的止损。我想找出数据帧中每个条目顺序的stoploss触发条索引。如果市场价格>=止损,则会触发止损,我想记录止损属于哪个入市订单。每个条目都根据其条目条索引进行记录。例如,第1栏输入价格为99的订单记录为输入订单1。第2栏的进场价98为进场单2,第5栏的进场价103为进场单5等

原始数据帧如下所示:

    entry  price  index  entryprice  stoploss  
0       0    100      0         NaN       NaN   
1       1     99      1        99.0     102.0    
2       1     98      2        98.0     101.0    
3       0    100      3         NaN       NaN    
4       0    101      4         NaN       NaN   
5       1    103      5       103.0     106.0   
6       0    105      6         NaN       NaN    
7       0    104      7         NaN       NaN   
8       0    106      8         NaN       NaN   
9       1    103      9       103.0     106.0   
10      0    100     10         NaN       NaN    
11      0    104     11         NaN       NaN    
12      0    108     12         NaN       NaN    
13      0    110     13         NaN       NaN     
def Stop(row, stoplist):
    output = None
    for i in range(len(stoplist)-1, -1, -1):
        (ix, stop) = stoplist[i]
        if row['price'] >= stop:
            output = ix
            stoplist.pop(i)

    if row['stoploss'] != None:
        stoplist.append( (row['index'], row['stoploss']) )

    return output

import pandas as pd

df = pd.DataFrame(
    {'price':[100,99,98,100,101,103,105,104,106,103,100,104,108,110],
     'entry': [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],})
df['index'] = df.index
df['entryprice'] = df['price'].where(df.entry==1)
df['stoploss'] = df['entryprice'] + 3
stoplist = []
df['stopix'] = df.apply(lambda row: Stop(row, stoplist), axis=1)
print(df)
代码是:

import pandas as pd

df = pd.DataFrame(
    {'price':[100,99,98,100,101,103,105,104,106,103,100,104,108,110],
     'entry': [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],})
df['index'] = df.index
df['entryprice'] = df['price'].where(df.entry==1)
df['stoploss'] = df['entryprice'] + 3
为了找出每个订单在哪里触发止损,我采用了应用的方式。我定义了一个外部参数stoplist,它记录了所有未触发的止损单及其对应的分录单索引。然后我将df的每一行传递给函数,并将市场价格与停止列表中的停止损失进行比较,只要满足条件,就将输入顺序索引分配给这一行,并将其从停止列表变量中删除。 代码如下:

    entry  price  index  entryprice  stoploss  
0       0    100      0         NaN       NaN   
1       1     99      1        99.0     102.0    
2       1     98      2        98.0     101.0    
3       0    100      3         NaN       NaN    
4       0    101      4         NaN       NaN   
5       1    103      5       103.0     106.0   
6       0    105      6         NaN       NaN    
7       0    104      7         NaN       NaN   
8       0    106      8         NaN       NaN   
9       1    103      9       103.0     106.0   
10      0    100     10         NaN       NaN    
11      0    104     11         NaN       NaN    
12      0    108     12         NaN       NaN    
13      0    110     13         NaN       NaN     
def Stop(row, stoplist):
    output = None
    for i in range(len(stoplist)-1, -1, -1):
        (ix, stop) = stoplist[i]
        if row['price'] >= stop:
            output = ix
            stoplist.pop(i)

    if row['stoploss'] != None:
        stoplist.append( (row['index'], row['stoploss']) )

    return output

import pandas as pd

df = pd.DataFrame(
    {'price':[100,99,98,100,101,103,105,104,106,103,100,104,108,110],
     'entry': [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],})
df['index'] = df.index
df['entryprice'] = df['price'].where(df.entry==1)
df['stoploss'] = df['entryprice'] + 3
stoplist = []
df['stopix'] = df.apply(lambda row: Stop(row, stoplist), axis=1)
print(df)
最终输出为:

    entry  price  index  entryprice  stoploss  stopix
0       0    100      0         NaN       NaN     NaN
1       1     99      1        99.0     102.0     NaN
2       1     98      2        98.0     101.0     NaN
3       0    100      3         NaN       NaN     NaN
4       0    101      4         NaN       NaN     2.0
5       1    103      5       103.0     106.0     1.0
6       0    105      6         NaN       NaN     NaN
7       0    104      7         NaN       NaN     NaN
8       0    106      8         NaN       NaN     5.0
9       1    103      9       103.0     106.0     NaN
10      0    100     10         NaN       NaN     NaN
11      0    104     11         NaN       NaN     NaN
12      0    108     12         NaN       NaN     9.0
13      0    110     13         NaN       NaN     NaN
最后一列stopix就是我想要的。但是这个解决方案的唯一问题是apply不是很有效,我想知道是否有一种矢量化的方法来做到这一点?或者,如果有更好的解决方案,提高性能将是有益的。因为效率对我来说至关重要

谢谢

以下是我的看法:

# mark the block starting by entry
blocks = df.stoploss.notna().cumsum()

# mark where the prices are higher than or equal to entry price
higher = df['stoploss'].ffill().le(df.price)

# group higher by entries
g = higher.groupby(blocks)

# where the entry occurs in each group
idx = g.transform('idxmin')

# transform the idx to where the first higher occurs
df['stopix'] = np.where(g.cumsum().eq(1), idx, np.nan)
输出:

    entry  price  index  entryprice  stoploss  stopix
0       0    100      0         NaN       NaN     NaN
1       1     99      1        99.0     102.0     NaN
2       1     98      2        98.0     101.0     NaN
3       0    100      3         NaN       NaN     NaN
4       0    101      4         NaN       NaN     2.0
5       1    103      5       103.0     106.0     NaN
6       0    105      6         NaN       NaN     NaN
7       0    104      7         NaN       NaN     NaN
8       0    106      8         NaN       NaN     5.0
9       1    103      9       103.0     106.0     NaN
10      0    100     10         NaN       NaN     NaN
11      0    104     11         NaN       NaN     NaN
12      0    108     12         NaN       NaN     9.0
13      0    110     13         NaN       NaN     NaN
以下是我的看法:

# mark the block starting by entry
blocks = df.stoploss.notna().cumsum()

# mark where the prices are higher than or equal to entry price
higher = df['stoploss'].ffill().le(df.price)

# group higher by entries
g = higher.groupby(blocks)

# where the entry occurs in each group
idx = g.transform('idxmin')

# transform the idx to where the first higher occurs
df['stopix'] = np.where(g.cumsum().eq(1), idx, np.nan)
输出:

    entry  price  index  entryprice  stoploss  stopix
0       0    100      0         NaN       NaN     NaN
1       1     99      1        99.0     102.0     NaN
2       1     98      2        98.0     101.0     NaN
3       0    100      3         NaN       NaN     NaN
4       0    101      4         NaN       NaN     2.0
5       1    103      5       103.0     106.0     NaN
6       0    105      6         NaN       NaN     NaN
7       0    104      7         NaN       NaN     NaN
8       0    106      8         NaN       NaN     5.0
9       1    103      9       103.0     106.0     NaN
10      0    100     10         NaN       NaN     NaN
11      0    104     11         NaN       NaN     NaN
12      0    108     12         NaN       NaN     9.0
13      0    110     13         NaN       NaN     NaN