Pandas 如何以矢量化的方式得到满足一定条件的行索引?
我有一个包含市场价格和订单信息的timeseries数据框。对于每个条目,都有相应的止损。我想找出数据帧中每个条目顺序的stoploss触发条索引。如果市场价格>=止损,则会触发止损,我想记录止损属于哪个入市订单。每个条目都根据其条目条索引进行记录。例如,第1栏输入价格为99的订单记录为输入订单1。第2栏的进场价98为进场单2,第5栏的进场价103为进场单5等 原始数据帧如下所示:Pandas 如何以矢量化的方式得到满足一定条件的行索引?,pandas,Pandas,我有一个包含市场价格和订单信息的timeseries数据框。对于每个条目,都有相应的止损。我想找出数据帧中每个条目顺序的stoploss触发条索引。如果市场价格>=止损,则会触发止损,我想记录止损属于哪个入市订单。每个条目都根据其条目条索引进行记录。例如,第1栏输入价格为99的订单记录为输入订单1。第2栏的进场价98为进场单2,第5栏的进场价103为进场单5等 原始数据帧如下所示: entry price index entryprice stoploss 0 0
entry price index entryprice stoploss
0 0 100 0 NaN NaN
1 1 99 1 99.0 102.0
2 1 98 2 98.0 101.0
3 0 100 3 NaN NaN
4 0 101 4 NaN NaN
5 1 103 5 103.0 106.0
6 0 105 6 NaN NaN
7 0 104 7 NaN NaN
8 0 106 8 NaN NaN
9 1 103 9 103.0 106.0
10 0 100 10 NaN NaN
11 0 104 11 NaN NaN
12 0 108 12 NaN NaN
13 0 110 13 NaN NaN
def Stop(row, stoplist):
output = None
for i in range(len(stoplist)-1, -1, -1):
(ix, stop) = stoplist[i]
if row['price'] >= stop:
output = ix
stoplist.pop(i)
if row['stoploss'] != None:
stoplist.append( (row['index'], row['stoploss']) )
return output
import pandas as pd
df = pd.DataFrame(
{'price':[100,99,98,100,101,103,105,104,106,103,100,104,108,110],
'entry': [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],})
df['index'] = df.index
df['entryprice'] = df['price'].where(df.entry==1)
df['stoploss'] = df['entryprice'] + 3
stoplist = []
df['stopix'] = df.apply(lambda row: Stop(row, stoplist), axis=1)
print(df)
代码是:
import pandas as pd
df = pd.DataFrame(
{'price':[100,99,98,100,101,103,105,104,106,103,100,104,108,110],
'entry': [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],})
df['index'] = df.index
df['entryprice'] = df['price'].where(df.entry==1)
df['stoploss'] = df['entryprice'] + 3
为了找出每个订单在哪里触发止损,我采用了应用的方式。我定义了一个外部参数stoplist,它记录了所有未触发的止损单及其对应的分录单索引。然后我将df的每一行传递给函数,并将市场价格与停止列表中的停止损失进行比较,只要满足条件,就将输入顺序索引分配给这一行,并将其从停止列表变量中删除。
代码如下:
entry price index entryprice stoploss
0 0 100 0 NaN NaN
1 1 99 1 99.0 102.0
2 1 98 2 98.0 101.0
3 0 100 3 NaN NaN
4 0 101 4 NaN NaN
5 1 103 5 103.0 106.0
6 0 105 6 NaN NaN
7 0 104 7 NaN NaN
8 0 106 8 NaN NaN
9 1 103 9 103.0 106.0
10 0 100 10 NaN NaN
11 0 104 11 NaN NaN
12 0 108 12 NaN NaN
13 0 110 13 NaN NaN
def Stop(row, stoplist):
output = None
for i in range(len(stoplist)-1, -1, -1):
(ix, stop) = stoplist[i]
if row['price'] >= stop:
output = ix
stoplist.pop(i)
if row['stoploss'] != None:
stoplist.append( (row['index'], row['stoploss']) )
return output
import pandas as pd
df = pd.DataFrame(
{'price':[100,99,98,100,101,103,105,104,106,103,100,104,108,110],
'entry': [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],})
df['index'] = df.index
df['entryprice'] = df['price'].where(df.entry==1)
df['stoploss'] = df['entryprice'] + 3
stoplist = []
df['stopix'] = df.apply(lambda row: Stop(row, stoplist), axis=1)
print(df)
最终输出为:
entry price index entryprice stoploss stopix
0 0 100 0 NaN NaN NaN
1 1 99 1 99.0 102.0 NaN
2 1 98 2 98.0 101.0 NaN
3 0 100 3 NaN NaN NaN
4 0 101 4 NaN NaN 2.0
5 1 103 5 103.0 106.0 1.0
6 0 105 6 NaN NaN NaN
7 0 104 7 NaN NaN NaN
8 0 106 8 NaN NaN 5.0
9 1 103 9 103.0 106.0 NaN
10 0 100 10 NaN NaN NaN
11 0 104 11 NaN NaN NaN
12 0 108 12 NaN NaN 9.0
13 0 110 13 NaN NaN NaN
最后一列stopix就是我想要的。但是这个解决方案的唯一问题是apply不是很有效,我想知道是否有一种矢量化的方法来做到这一点?或者,如果有更好的解决方案,提高性能将是有益的。因为效率对我来说至关重要
谢谢以下是我的看法:
# mark the block starting by entry
blocks = df.stoploss.notna().cumsum()
# mark where the prices are higher than or equal to entry price
higher = df['stoploss'].ffill().le(df.price)
# group higher by entries
g = higher.groupby(blocks)
# where the entry occurs in each group
idx = g.transform('idxmin')
# transform the idx to where the first higher occurs
df['stopix'] = np.where(g.cumsum().eq(1), idx, np.nan)
输出:
entry price index entryprice stoploss stopix
0 0 100 0 NaN NaN NaN
1 1 99 1 99.0 102.0 NaN
2 1 98 2 98.0 101.0 NaN
3 0 100 3 NaN NaN NaN
4 0 101 4 NaN NaN 2.0
5 1 103 5 103.0 106.0 NaN
6 0 105 6 NaN NaN NaN
7 0 104 7 NaN NaN NaN
8 0 106 8 NaN NaN 5.0
9 1 103 9 103.0 106.0 NaN
10 0 100 10 NaN NaN NaN
11 0 104 11 NaN NaN NaN
12 0 108 12 NaN NaN 9.0
13 0 110 13 NaN NaN NaN
以下是我的看法:
# mark the block starting by entry
blocks = df.stoploss.notna().cumsum()
# mark where the prices are higher than or equal to entry price
higher = df['stoploss'].ffill().le(df.price)
# group higher by entries
g = higher.groupby(blocks)
# where the entry occurs in each group
idx = g.transform('idxmin')
# transform the idx to where the first higher occurs
df['stopix'] = np.where(g.cumsum().eq(1), idx, np.nan)
输出:
entry price index entryprice stoploss stopix
0 0 100 0 NaN NaN NaN
1 1 99 1 99.0 102.0 NaN
2 1 98 2 98.0 101.0 NaN
3 0 100 3 NaN NaN NaN
4 0 101 4 NaN NaN 2.0
5 1 103 5 103.0 106.0 NaN
6 0 105 6 NaN NaN NaN
7 0 104 7 NaN NaN NaN
8 0 106 8 NaN NaN 5.0
9 1 103 9 103.0 106.0 NaN
10 0 100 10 NaN NaN NaN
11 0 104 11 NaN NaN NaN
12 0 108 12 NaN NaN 9.0
13 0 110 13 NaN NaN NaN