Python 如果存在';熊猫中的混合列数据

Python 如果存在';熊猫中的混合列数据,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个CSV文件,看起来像: Timestamp Surface_Data 8737.37 Maze_A 8737.42 Maze_A 8740.40 Phone_Surface 8743.23 Desktop_Surface 8765.26 Phone_Surface 8765.29 Maze_A 8765.30 Phone_Surface 8765.56

我有一个CSV文件,看起来像:

Timestamp       Surface_Data
8737.37         Maze_A
8737.42         Maze_A
8740.40         Phone_Surface
8743.23         Desktop_Surface
8765.26         Phone_Surface
8765.29         Maze_A
8765.30         Phone_Surface
8765.56         Maze_B
8766.16         Maze_B
8783.74         Maze_A
8793.20         Maze_A
8840.12         Phone_Surface
8840.40         Phone_Surface
8841.40         Maze_B
我想添加一列,计算迷宫a到迷宫B或迷宫B到迷宫a的变化,它必须看起来像:

Timestamp       Surface_Data         Maze_Count
8737.37         Maze_A               1
8737.42         Maze_A
8740.40         Phone_Surface
8743.23         Desktop_Surface
8765.26         Phone_Surface
8765.29         Maze_A
8765.30         Phone_Surface
8765.56         Maze_B               2
8766.16         Maze_B
8783.74         Maze_A               3
8793.20         Maze_A
8840.12         Phone_Surface
8840.40         Phone_Surface
8841.40         Maze_B               4
当“Surface_Data”列中的值发生更改时,我尝试使用cumsum(),但它考虑了所有更改,包括其他不需要的值。因此,我想要的东西只有在遇到迷宫A或迷宫B值时才会增加。

shift
where
cumsum
一次尝试:

c=df['Surface_Data'].str.contains('Maze'))
df['Maze_Count']=df.loc[c',Surface_Data'].ne(df.loc[c',Surface_Data'].shift()
).astype(int).replace(0,np.nan).cumsum()

您也可以尝试过滤“迷宫A”和“迷宫B”的数据帧,使用
shift
查找更改,然后
cumsum
删除重复项
,最后,
使用内在索引对齐将
分配回数据帧:

x = df.loc[df['Surface_Data'].isin(['Maze_A','Maze_B']), 'Surface_Data']
df.assign(Maze_count=(x != x.shift()).cumsum().drop_duplicates())
输出:

    Timestamp     Surface_Data  Maze_count
0     8737.37           Maze_A         1.0
1     8737.42           Maze_A         NaN
2     8740.40    Phone_Surface         NaN
3     8743.23  Desktop_Surface         NaN
4     8765.26    Phone_Surface         NaN
5     8765.29           Maze_A         NaN
6     8765.30    Phone_Surface         NaN
7     8765.56           Maze_B         2.0
8     8766.16           Maze_B         NaN
9     8783.74           Maze_A         3.0
10    8793.20           Maze_A         NaN
11    8840.12    Phone_Surface         NaN
12    8840.40    Phone_Surface         NaN
13    8841.40           Maze_B         4.0
x = df.loc[df['Surface_Data'].isin(['Maze_A','Maze_B']), 'Surface_Data']
df.assign(Maze_count=(x != x.shift()).cumsum().drop_duplicates())
    Timestamp     Surface_Data  Maze_count
0     8737.37           Maze_A         1.0
1     8737.42           Maze_A         NaN
2     8740.40    Phone_Surface         NaN
3     8743.23  Desktop_Surface         NaN
4     8765.26    Phone_Surface         NaN
5     8765.29           Maze_A         NaN
6     8765.30    Phone_Surface         NaN
7     8765.56           Maze_B         2.0
8     8766.16           Maze_B         NaN
9     8783.74           Maze_A         3.0
10    8793.20           Maze_A         NaN
11    8840.12    Phone_Surface         NaN
12    8840.40    Phone_Surface         NaN
13    8841.40           Maze_B         4.0