Pandas 如何以矢量化的方式得到满足一定条件的行索引？_Pandas

Pandas 如何以矢量化的方式得到满足一定条件的行索引？

pandas

Pandas 如何以矢量化的方式得到满足一定条件的行索引？,pandas,Pandas,我有一个包含市场价格和订单信息的timeseries数据框。对于每个条目，都有相应的止损。我想找出数据帧中每个条目顺序的stoploss触发条索引。如果市场价格>=止损，则会触发止损，我想记录止损属于哪个入市订单。每个条目都根据其条目条索引进行记录。例如，第1栏输入价格为99的订单记录为输入订单1。第2栏的进场价98为进场单2，第5栏的进场价103为进场单5等原始数据帧如下所示： entry price index entryprice stoploss 0 0

我有一个包含市场价格和订单信息的timeseries数据框。对于每个条目，都有相应的止损。我想找出数据帧中每个条目顺序的stoploss触发条索引。如果市场价格>=止损，则会触发止损，我想记录止损属于哪个入市订单。每个条目都根据其条目条索引进行记录。例如，第1栏输入价格为99的订单记录为输入订单1。第2栏的进场价98为进场单2，第5栏的进场价103为进场单5等
原始数据帧如下所示：

entry price index entryprice stoploss 0 0 100 0 NaN NaN 1 1 99 1 99.0 102.0 2 1 98 2 98.0 101.0 3 0 100 3 NaN NaN 4 0 101 4 NaN NaN 5 1 103 5 103.0 106.0 6 0 105 6 NaN NaN 7 0 104 7 NaN NaN 8 0 106 8 NaN NaN 9 1 103 9 103.0 106.0 10 0 100 10 NaN NaN 11 0 104 11 NaN NaN 12 0 108 12 NaN NaN 13 0 110 13 NaN NaN

def Stop(row, stoplist): output = None for i in range(len(stoplist)-1, -1, -1): (ix, stop) = stoplist[i] if row['price'] >= stop: output = ix stoplist.pop(i) if row['stoploss'] != None: stoplist.append( (row['index'], row['stoploss']) ) return output import pandas as pd df = pd.DataFrame( {'price':[100,99,98,100,101,103,105,104,106,103,100,104,108,110], 'entry': [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],}) df['index'] = df.index df['entryprice'] = df['price'].where(df.entry==1) df['stoploss'] = df['entryprice'] + 3 stoplist = [] df['stopix'] = df.apply(lambda row: Stop(row, stoplist), axis=1) print(df)
代码是：

import pandas as pd df = pd.DataFrame( {'price':[100,99,98,100,101,103,105,104,106,103,100,104,108,110], 'entry': [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],}) df['index'] = df.index df['entryprice'] = df['price'].where(df.entry==1) df['stoploss'] = df['entryprice'] + 3
为了找出每个订单在哪里触发止损，我采用了应用的方式。我定义了一个外部参数stoplist，它记录了所有未触发的止损单及其对应的分录单索引。然后我将df的每一行传递给函数，并将市场价格与停止列表中的停止损失进行比较，只要满足条件，就将输入顺序索引分配给这一行，并将其从停止列表变量中删除。代码如下：

entry price index entryprice stoploss 0 0 100 0 NaN NaN 1 1 99 1 99.0 102.0 2 1 98 2 98.0 101.0 3 0 100 3 NaN NaN 4 0 101 4 NaN NaN 5 1 103 5 103.0 106.0 6 0 105 6 NaN NaN 7 0 104 7 NaN NaN 8 0 106 8 NaN NaN 9 1 103 9 103.0 106.0 10 0 100 10 NaN NaN 11 0 104 11 NaN NaN 12 0 108 12 NaN NaN 13 0 110 13 NaN NaN

def Stop(row, stoplist): output = None for i in range(len(stoplist)-1, -1, -1): (ix, stop) = stoplist[i] if row['price'] >= stop: output = ix stoplist.pop(i) if row['stoploss'] != None: stoplist.append( (row['index'], row['stoploss']) ) return output import pandas as pd df = pd.DataFrame( {'price':[100,99,98,100,101,103,105,104,106,103,100,104,108,110], 'entry': [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],}) df['index'] = df.index df['entryprice'] = df['price'].where(df.entry==1) df['stoploss'] = df['entryprice'] + 3 stoplist = [] df['stopix'] = df.apply(lambda row: Stop(row, stoplist), axis=1) print(df)
最终输出为：

entry price index entryprice stoploss stopix 0 0 100 0 NaN NaN NaN 1 1 99 1 99.0 102.0 NaN 2 1 98 2 98.0 101.0 NaN 3 0 100 3 NaN NaN NaN 4 0 101 4 NaN NaN 2.0 5 1 103 5 103.0 106.0 1.0 6 0 105 6 NaN NaN NaN 7 0 104 7 NaN NaN NaN 8 0 106 8 NaN NaN 5.0 9 1 103 9 103.0 106.0 NaN 10 0 100 10 NaN NaN NaN 11 0 104 11 NaN NaN NaN 12 0 108 12 NaN NaN 9.0 13 0 110 13 NaN NaN NaN
最后一列stopix就是我想要的。但是这个解决方案的唯一问题是apply不是很有效，我想知道是否有一种矢量化的方法来做到这一点？或者，如果有更好的解决方案，提高性能将是有益的。因为效率对我来说至关重要
谢谢
以下是我的看法：

# mark the block starting by entry blocks = df.stoploss.notna().cumsum() # mark where the prices are higher than or equal to entry price higher = df['stoploss'].ffill().le(df.price) # group higher by entries g = higher.groupby(blocks) # where the entry occurs in each group idx = g.transform('idxmin') # transform the idx to where the first higher occurs df['stopix'] = np.where(g.cumsum().eq(1), idx, np.nan)
输出：

entry price index entryprice stoploss stopix 0 0 100 0 NaN NaN NaN 1 1 99 1 99.0 102.0 NaN 2 1 98 2 98.0 101.0 NaN 3 0 100 3 NaN NaN NaN 4 0 101 4 NaN NaN 2.0 5 1 103 5 103.0 106.0 NaN 6 0 105 6 NaN NaN NaN 7 0 104 7 NaN NaN NaN 8 0 106 8 NaN NaN 5.0 9 1 103 9 103.0 106.0 NaN 10 0 100 10 NaN NaN NaN 11 0 104 11 NaN NaN NaN 12 0 108 12 NaN NaN 9.0 13 0 110 13 NaN NaN NaN
以下是我的看法：

# mark the block starting by entry blocks = df.stoploss.notna().cumsum() # mark where the prices are higher than or equal to entry price higher = df['stoploss'].ffill().le(df.price) # group higher by entries g = higher.groupby(blocks) # where the entry occurs in each group idx = g.transform('idxmin') # transform the idx to where the first higher occurs df['stopix'] = np.where(g.cumsum().eq(1), idx, np.nan)
输出：

entry price index entryprice stoploss stopix 0 0 100 0 NaN NaN NaN 1 1 99 1 99.0 102.0 NaN 2 1 98 2 98.0 101.0 NaN 3 0 100 3 NaN NaN NaN 4 0 101 4 NaN NaN 2.0 5 1 103 5 103.0 106.0 NaN 6 0 105 6 NaN NaN NaN 7 0 104 7 NaN NaN NaN 8 0 106 8 NaN NaN 5.0 9 1 103 9 103.0 106.0 NaN 10 0 100 10 NaN NaN NaN 11 0 104 11 NaN NaN NaN 12 0 108 12 NaN NaN 9.0 13 0 110 13 NaN NaN NaN