Python Dataframe:在非NaN的同一列中使用上一个值的掩码_Python_Pandas_Dataframe

Python Dataframe:在非NaN的同一列中使用上一个值的掩码

python pandas dataframe

Python Dataframe:在非NaN的同一列中使用上一个值的掩码,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧： Trajectory Direction Resulting_Direction STRAIGHT NORTH NORTH STRAIGHT NaN NORTH LEFT NaN WEST LEFT NaN WEST LEFT NaN WEST STRAIGHT NaN WEST STRAIGHT NaN WEST RIGHT NaN

我有以下数据帧：

Trajectory Direction Resulting_Direction
STRAIGHT   NORTH     NORTH
STRAIGHT   NaN       NORTH
LEFT       NaN       WEST
LEFT       NaN       WEST
LEFT       NaN       WEST
STRAIGHT   NaN       WEST
STRAIGHT   NaN       WEST
RIGHT      NaN       NORTH
RIGHT      NaN       NORTH
RIGHT      NaN       NORTH

我的目标是在遇到三条直线轨迹时改变方向。因此，在这个例子中，我的新列将产生_方向（假设它最初不在df中）

目前，我通过逐行执行if语句来实现这一点。然而，这是痛苦的缓慢和低效。我希望使用一个遮罩来设置它旋转的行中的结果方向，然后使用fillna（method=“ffill”）。这是我的尝试：

df.loc[:,'direction'] = np.NaN
df.loc[df.index == 0, "direction"] = "WEST"
# mask is for finding when a signal hasnt changed in three seconds, but now has
mask = (df.trajectory != df.trajectory.shift(1)) & (df.trajectory == df.trajectory.shift(-1)) & (df.trajectory == df.trajectory.shift(-2))
df.loc[(mask) & (df['trajectory'] == 'LEFT') & (df['direction'].dropna().shift() == "WEST"),'direction'] = 'SOUTH'
df.loc[(mask) & (df['trajectory'] == 'LEFT') & (df['direction'].dropna().shift() == "SOUTH"),'direction'] = 'EAST'
df.loc[(mask) & (df['trajectory'] == 'LEFT') & (df['direction'].dropna().shift() == "EAST"),'direction'] = 'NORTH'
df.loc[(mask) & (df['trajectory'] == 'LEFT') & (df['direction'].dropna().shift() == "NORTH"),'direction'] = 'WEST'
df.loc[(mask) & (df['trajectory'] == 'RIGHT') & (df['direction'].dropna().shift() == "WEST"),'direction'] = 'NORTH'
df.loc[(mask) & (df['trajectory'] == 'RIGHT') & (df['direction'].dropna().shift() == "SOUTH"),'direction'] = 'WEST'
df.loc[(mask) & (df['trajectory'] == 'RIGHT') & (df['direction'].dropna().shift() == "EAST"),'direction'] = 'SOUTH'
df.loc[(mask) & (df['trajectory'] == 'RIGHT') & (df['direction'].dropna().shift() == "NORTH"),'direction'] = 'EAST'
df.loc[:,'direction'] = df.direction.fillna(method="ffill")
print(df[['trajectory','direction']])

我相信我的问题出在df['direction'].dropna（）.shift（）。如何在非NaN的同一列中找到上一个值？

IIUC，问题是检测方向变化的位置，假设在3个连续变化命令的开头：

thresh = 3
# mark the consecutive direction commands
blocks = df.Trajectory.ne(df.Trajectory.shift()).cumsum()


# group by blocks
groups = df.groupby(blocks)

# enumerate each block
df['mask'] = groups.cumcount()

# shift up to mark the beginning
# mod thresh to divide each block into small block of thresh
df['mask'] = groups['mask'].shift(1-thresh) % thresh

# for conversion of direction to letters:
changes = {'LEFT': -1,'RIGHT':1}

# all the directions
directions = ['NORTH', 'EAST', 'SOUTH', 'WEST']

# update directions according to the start direction
start = df['Direction'].iloc[0]
start_idx = directions.index(start)
directions = {k%4: v for k,v in enumerate(directions, start=start_idx)}


# update direction changes
direction_changes = (df.Trajectory
                     .where(df['mask'].eq(2))   # where the changes happends
                     .map(changes)              # replace the changes with number
                     .fillna(0)                 # where no direction change is 0
                    )
# mod 4 for the 4 direction
# and map
df['Resulting_Direction'] = (direction_changes.cumsum() % 4).map(directions)

输出：

  Trajectory Direction Resulting_Direction  mask
0   STRAIGHT     NORTH               NORTH   NaN
1   STRAIGHT       NaN               NORTH   NaN
2       LEFT       NaN                WEST   2.0
3       LEFT       NaN                WEST   NaN
4       LEFT       NaN                WEST   NaN
5   STRAIGHT       NaN                WEST   NaN
6   STRAIGHT       NaN                WEST   NaN
7      RIGHT       NaN               NORTH   2.0
8      RIGHT       NaN               NORTH   NaN
9      RIGHT       NaN               NORTH   NaN

df.Direction.last\u valid\u index（）

？@QuangHoang如果只返回索引而不返回值，我该如何在loc掩码中使用它？我有点困惑，为什么第3、4、5、6行是

WEST

？如果它是一辆向北行驶的车，那么它就向左拐。现在它正在西进。然后右转，再向北走。在我的逻辑中，我希望看到一行中的3个右/左值以避免噪声。这是缓慢的1秒数据。然而，我更感兴趣的是编码问题，这只是我的例子，那么你的数据总是在3个连续相等的方向上最大吗？这很接近，但结果方向应该只在轨迹发生变化的地方改变，而不仅仅是在同一行中有三条轨迹时。因此，如果我们在直线运动后一行有6个左撇子，我们应该只看到轨迹变化一次，而不是两次。如果是这种情况，那么您只需要删除

df['mask']=groups['mask']]末尾的%thresh
。shift…

那么每个块将恰好有一个值

。