Python Dataframe:在非NaN的同一列中使用上一个值的掩码
我有以下数据帧:Python Dataframe:在非NaN的同一列中使用上一个值的掩码,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧: Trajectory Direction Resulting_Direction STRAIGHT NORTH NORTH STRAIGHT NaN NORTH LEFT NaN WEST LEFT NaN WEST LEFT NaN WEST STRAIGHT NaN WEST STRAIGHT NaN WEST RIGHT NaN
Trajectory Direction Resulting_Direction
STRAIGHT NORTH NORTH
STRAIGHT NaN NORTH
LEFT NaN WEST
LEFT NaN WEST
LEFT NaN WEST
STRAIGHT NaN WEST
STRAIGHT NaN WEST
RIGHT NaN NORTH
RIGHT NaN NORTH
RIGHT NaN NORTH
我的目标是在遇到三条直线轨迹时改变方向。因此,在这个例子中,我的新列将产生_方向(假设它最初不在df中)
目前,我通过逐行执行if语句来实现这一点。然而,这是痛苦的缓慢和低效。我希望使用一个遮罩来设置它旋转的行中的结果方向,然后使用fillna(method=“ffill”)。这是我的尝试:
df.loc[:,'direction'] = np.NaN
df.loc[df.index == 0, "direction"] = "WEST"
# mask is for finding when a signal hasnt changed in three seconds, but now has
mask = (df.trajectory != df.trajectory.shift(1)) & (df.trajectory == df.trajectory.shift(-1)) & (df.trajectory == df.trajectory.shift(-2))
df.loc[(mask) & (df['trajectory'] == 'LEFT') & (df['direction'].dropna().shift() == "WEST"),'direction'] = 'SOUTH'
df.loc[(mask) & (df['trajectory'] == 'LEFT') & (df['direction'].dropna().shift() == "SOUTH"),'direction'] = 'EAST'
df.loc[(mask) & (df['trajectory'] == 'LEFT') & (df['direction'].dropna().shift() == "EAST"),'direction'] = 'NORTH'
df.loc[(mask) & (df['trajectory'] == 'LEFT') & (df['direction'].dropna().shift() == "NORTH"),'direction'] = 'WEST'
df.loc[(mask) & (df['trajectory'] == 'RIGHT') & (df['direction'].dropna().shift() == "WEST"),'direction'] = 'NORTH'
df.loc[(mask) & (df['trajectory'] == 'RIGHT') & (df['direction'].dropna().shift() == "SOUTH"),'direction'] = 'WEST'
df.loc[(mask) & (df['trajectory'] == 'RIGHT') & (df['direction'].dropna().shift() == "EAST"),'direction'] = 'SOUTH'
df.loc[(mask) & (df['trajectory'] == 'RIGHT') & (df['direction'].dropna().shift() == "NORTH"),'direction'] = 'EAST'
df.loc[:,'direction'] = df.direction.fillna(method="ffill")
print(df[['trajectory','direction']])
我相信我的问题出在df['direction'].dropna().shift()。如何在非NaN的同一列中找到上一个值?IIUC,问题是检测方向变化的位置,假设在3个连续变化命令的开头:
thresh = 3
# mark the consecutive direction commands
blocks = df.Trajectory.ne(df.Trajectory.shift()).cumsum()
# group by blocks
groups = df.groupby(blocks)
# enumerate each block
df['mask'] = groups.cumcount()
# shift up to mark the beginning
# mod thresh to divide each block into small block of thresh
df['mask'] = groups['mask'].shift(1-thresh) % thresh
# for conversion of direction to letters:
changes = {'LEFT': -1,'RIGHT':1}
# all the directions
directions = ['NORTH', 'EAST', 'SOUTH', 'WEST']
# update directions according to the start direction
start = df['Direction'].iloc[0]
start_idx = directions.index(start)
directions = {k%4: v for k,v in enumerate(directions, start=start_idx)}
# update direction changes
direction_changes = (df.Trajectory
.where(df['mask'].eq(2)) # where the changes happends
.map(changes) # replace the changes with number
.fillna(0) # where no direction change is 0
)
# mod 4 for the 4 direction
# and map
df['Resulting_Direction'] = (direction_changes.cumsum() % 4).map(directions)
输出:
Trajectory Direction Resulting_Direction mask
0 STRAIGHT NORTH NORTH NaN
1 STRAIGHT NaN NORTH NaN
2 LEFT NaN WEST 2.0
3 LEFT NaN WEST NaN
4 LEFT NaN WEST NaN
5 STRAIGHT NaN WEST NaN
6 STRAIGHT NaN WEST NaN
7 RIGHT NaN NORTH 2.0
8 RIGHT NaN NORTH NaN
9 RIGHT NaN NORTH NaN
df.Direction.last\u valid\u index()
?@QuangHoang如果只返回索引而不返回值,我该如何在loc掩码中使用它?我有点困惑,为什么第3、4、5、6行是WEST
?如果它是一辆向北行驶的车,那么它就向左拐。现在它正在西进。然后右转,再向北走。在我的逻辑中,我希望看到一行中的3个右/左值以避免噪声。这是缓慢的1秒数据。然而,我更感兴趣的是编码问题,这只是我的例子,那么你的数据总是在3个连续相等的方向上最大吗?这很接近,但结果方向应该只在轨迹发生变化的地方改变,而不仅仅是在同一行中有三条轨迹时。因此,如果我们在直线运动后一行有6个左撇子,我们应该只看到轨迹变化一次,而不是两次。如果是这种情况,那么您只需要删除df['mask']=groups['mask']]末尾的%thresh
。shift…
那么每个块将恰好有一个值2
。