Python 如何根据带条件的列值在数据框中插入行?
我有一个约20k行的数据帧,看起来是这样的:Python 如何根据带条件的列值在数据框中插入行?,python,pandas,numpy,Python,Pandas,Numpy,我有一个约20k行的数据帧,看起来是这样的: import pandas as pd import numpy as np df = pd.DataFrame({'Car_ID': ['B332', 'B332', 'B332', 'C315', 'C315', 'C315', 'C315', 'C315', 'F310', 'F310'], \ 'Date': ['2018-03-12', '2018-03-14', '2018-03-15', '201
import pandas as pd
import numpy as np
df = pd.DataFrame({'Car_ID': ['B332', 'B332', 'B332', 'C315', 'C315', 'C315', 'C315', 'C315', 'F310', 'F310'], \
'Date': ['2018-03-12', '2018-03-14', '2018-03-15', '2018-03-17', '2018-03-13', '2018-03-15', \
'2018-03-18', '2018-03-21', '2018-03-10', '2018-03-13'], \
'Driver': ['Alex', 'Alex', 'Mick', 'Sara', 'Sara', 'Jean', 'Sara', 'Sara', 'Franck','Michel']})
df
Out:
Car_ID Date Driver
0 B332 2018-03-12 Alex
1 B332 2018-03-14 Alex
2 B332 2018-03-15 Mick
3 C315 2018-03-17 Sara
4 C315 2018-03-13 Sara
5 C315 2018-03-15 Jean
6 C315 2018-03-18 Sara
7 C315 2018-03-21 Sara
8 F310 2018-03-10 Franck
9 F310 2018-03-13 Michel
我为数据帧中的每个事件创建一个新列,如下所示:
df["Event"] = np.where(df.Car_ID.str.contains('B', case=True, na=False), 'Rent_Car_B', \
np.where(df.Car_ID.str.contains('C', case=True, na=False), 'Rent_Car_C', \
np.where(df.Car_ID.str.contains('F', case=True, na=False), 'Rent_Car_F', df.Car_ID)))
df
Out:
Car_ID Date Driver Event
0 B332 2018-03-12 Alex Rent_Car_B
1 B332 2018-03-14 Alex Rent_Car_B
2 B332 2018-03-15 Mick Rent_Car_B
3 C315 2018-03-17 Sara Rent_Car_C
4 C315 2018-03-13 Sara Rent_Car_C
5 C315 2018-03-15 Jean Rent_Car_C
6 C315 2018-03-18 Sara Rent_Car_C
7 C315 2018-03-21 Sara Rent_Car_C
8 F310 2018-03-10 Franck Rent_Car_F
9 F310 2018-03-13 Michel Rent_Car_F
Out:
Car_ID Date Driver Event
0 B332 2018-03-12 Alex Rent_Car_B
1 B332 2018-03-14 Alex Rent_Car_B
2 B332 2018-03-15 Mick Rent_Car_B
3 B332 2018-03-15 Alex to Mick
4 C315 2018-03-17 Sara Rent_Car_C
5 C315 2018-03-13 Sara Rent_Car_C
6 C315 2018-03-15 Jean Rent_Car_C
7 C315 2018-03-15 Sara to Jean
8 C315 2018-03-18 Sara Rent_Car_C
9 C315 2018-03-18 Jean to Sara
10 C315 2018-03-21 Sara Rent_Car_C
11 F310 2018-03-10 Franck Rent_Car_F
12 F310 2018-03-13 Michel Rent_Car_F
13 F310 2018-03-13 Franck to Mike
对于我的事件
列,我想为每个驱动程序更改添加新行,如下所示:
df["Event"] = np.where(df.Car_ID.str.contains('B', case=True, na=False), 'Rent_Car_B', \
np.where(df.Car_ID.str.contains('C', case=True, na=False), 'Rent_Car_C', \
np.where(df.Car_ID.str.contains('F', case=True, na=False), 'Rent_Car_F', df.Car_ID)))
df
Out:
Car_ID Date Driver Event
0 B332 2018-03-12 Alex Rent_Car_B
1 B332 2018-03-14 Alex Rent_Car_B
2 B332 2018-03-15 Mick Rent_Car_B
3 C315 2018-03-17 Sara Rent_Car_C
4 C315 2018-03-13 Sara Rent_Car_C
5 C315 2018-03-15 Jean Rent_Car_C
6 C315 2018-03-18 Sara Rent_Car_C
7 C315 2018-03-21 Sara Rent_Car_C
8 F310 2018-03-10 Franck Rent_Car_F
9 F310 2018-03-13 Michel Rent_Car_F
Out:
Car_ID Date Driver Event
0 B332 2018-03-12 Alex Rent_Car_B
1 B332 2018-03-14 Alex Rent_Car_B
2 B332 2018-03-15 Mick Rent_Car_B
3 B332 2018-03-15 Alex to Mick
4 C315 2018-03-17 Sara Rent_Car_C
5 C315 2018-03-13 Sara Rent_Car_C
6 C315 2018-03-15 Jean Rent_Car_C
7 C315 2018-03-15 Sara to Jean
8 C315 2018-03-18 Sara Rent_Car_C
9 C315 2018-03-18 Jean to Sara
10 C315 2018-03-21 Sara Rent_Car_C
11 F310 2018-03-10 Franck Rent_Car_F
12 F310 2018-03-13 Michel Rent_Car_F
13 F310 2018-03-13 Franck to Mike
我不确定是否有什么诀窍可以实现这项工作。
我将非常感谢你的建议 这是一个相当复杂的问题,我的看法是:
# Add the Driver columns by shifting grouped by the Event
df['new'] = df.groupby('Event').apply(lambda x : x['Driver'].shift(1) +'to'+ x['Driver']).values
# Split them by 'to'
df['new'] =df['new'].str.split('to').bfill()
# Check if both of them are equal
m = df['new'].str[0] != df['new'].str[1]
# Based on the condition create a new dataframe
new_df = df.loc[m].copy().iloc[:-1]
# Convert the list to the format you desired
new_df['new'] = new_df['new'].str[0] + ' to ' + new_df['new'].str[1]
# Concat new dataframe and old dataframe
mdf = pd.concat([df.drop('new',1) , new_df.drop(['Driver','Event'],1) \
.rename(columns = {'new':'Event'})])
Car_ID Date Driver Event
0 B332 2018-03-12 Alex Rent_Car_B
1 B332 2018-03-14 Alex Rent_Car_B
2 B332 2018-03-15 Mick Rent_Car_B
3 C315 2018-03-17 Sara Rent_Car_C
4 C315 2018-03-13 Sara Rent_Car_C
5 C315 2018-03-15 Jean Rent_Car_C
6 C315 2018-03-18 Sara Rent_Car_C
7 C315 2018-03-21 Sara Rent_Car_C
8 F310 2018-03-10 Franck Rent_Car_F
9 F310 2018-03-13 Michel Rent_Car_F
2 B332 2018-03-15 NaN Alex to Mick
5 C315 2018-03-15 NaN Sara to Jean
6 C315 2018-03-18 NaN Jean to Sara
8 F310 2018-03-10 NaN Franck to Michel
如果您需要顺序,则对索引进行排序,即
mdf = mdf.sort_index()
使用
shift
方法,首先用它创建一列,我们将在以下步骤后使用该列:
df['Driver_shift'] = df['Driver'].shift()
选择您实际更改驾驶员和相同车辆ID的行(使用掩码):
mask = (df['Driver'] != df['Driver_shift'])&(df['Car_ID'] == df['Car_ID'].shift())
df_change = df[mask]
现在,通过添加0.5来更改索引,以便以后进行连接和排序,并更改两列的值:
df_change = df_change.set_index(df_change.index+0.5)
df_change.loc[:,'Event'] = df_change['Driver_shift'] + ' to ' + df_change['Driver']
df_change['Driver'] = '' # to replace the value
现在,您可以连接、排序、重置索引和删除:
pd.concat([df,df_change]).sort_index().reset_index(drop=True).drop('Driver_shift',1)
你会得到:
Car_ID Date Driver Event
0 B332 2018-03-12 Alex Rent_Car_B
1 B332 2018-03-14 Alex Rent_Car_B
2 B332 2018-03-15 Mick Rent_Car_B
3 B332 2018-03-15 Alex to Mick
4 C315 2018-03-17 Sara Rent_Car_C
5 C315 2018-03-13 Sara Rent_Car_C
6 C315 2018-03-15 Jean Rent_Car_C
7 C315 2018-03-15 Sara to Jean
8 C315 2018-03-18 Sara Rent_Car_C
9 C315 2018-03-18 Jean to Sara
10 C315 2018-03-21 Sara Rent_Car_C
11 F310 2018-03-10 Franck Rent_Car_F
12 F310 2018-03-13 Michel Rent_Car_F
13 F310 2018-03-13 Franck to Michel
编辑:在每个驱动程序和日期前添加一行
df1 = df.copy()
df1.index = df1.index +0.5
df2 = pd.concat([df.drop('Event',1),df1]).sort_index().reset_index(drop=True)
df2['Event'] = df2['Event'].fillna(df2['Driver'])
结果是在df2中只需几次换班,您就可以很好地完成这项工作!您还可以使用此方法正确获取索引,并将它们添加到您想要的位置 添加
事件后,执行数据帧:
import pandas as pd
# Modify the index so we can later append to the correct rows
df.index= df.index*2
# Determine when switches occur
mask = (df.Driver != df.Driver.shift(1)) & (df.Car_ID == df.Car_ID.shift(1))
sw_from = df[mask.shift(-1).fillna(False)].copy()
sw_to = df[mask].copy()
# Make the switching rows have the correct information
sw_to['Event'] = sw_from.Driver.values + ' to ' + sw_to.Driver.values
sw_to['Driver'] = ''
# Modify the switching indices so they get added to the proper position
sw_to.index = sw_to.index+1
# Add them to df
df = df.append(sw_to).sort_index().reset_index(drop=True)
输出:
Car_ID Date Driver Event
0 B332 2018-03-12 Alex Rent_Car_B
1 B332 2018-03-14 Alex Rent_Car_B
2 B332 2018-03-15 Mick Rent_Car_B
3 B332 2018-03-15 Alex to Mick
4 C315 2018-03-17 Sara Rent_Car_C
5 C315 2018-03-13 Sara Rent_Car_C
6 C315 2018-03-15 Jean Rent_Car_C
7 C315 2018-03-15 Sara to Jean
8 C315 2018-03-18 Sara Rent_Car_C
9 C315 2018-03-18 Jean to Sara
10 C315 2018-03-21 Sara Rent_Car_C
11 F310 2018-03-10 Franck Rent_Car_F
12 F310 2018-03-13 Michel Rent_Car_F
13 F310 2018-03-13 Franck to Michel
“Alex”出现在第0行和第1行是什么意思?第一部分-你可以做df['Event']='Rent\u Car\uu'+df['Car\u ID'].str[0]
@Dillon这意味着他驾驶同一辆车两天。谢谢你的帮助@DarkHi@Ben.T如果最新日期在datafrmae的顶部,我怎么能做同样的工作。如下:Car\u ID Date Driver Event 0 F310 2018-03-13 Michel Rent\u Car\u F 1 F310 2018-03-10 Franck Rent\u Car\u F 2 C315 2018-03-21 Sara Rent\u Car\u c3 C315 2018-03-18 Sara Rent\u Car\u c4 C315 2018-03-15 Jean Rent\u Car\u c5 C315 2018-03-13 Sara Rent\u Car c6 C315 2018-03-17 Sara出租汽车C 7 B332 2018-03-15米克出租汽车B 8 B332 2018-03-14亚历克斯出租汽车B 9 B332 2018-03-12亚历克斯出租汽车B
@M-M你想在哪里增加一排?例如弗兰克对米歇尔(或米歇尔对弗兰克):弗兰克之后,米歇尔之前,两者之间?我解决了这个问题。但是我想为每个唯一的Car\u ID
添加一个新行,并在事件列中获取驱动程序的名称并添加日期。类似这样的Car\u ID Date Driver Event 0 F310 2018-03-13 Michel Michel 1 F310 2018-03-13 Michel Rent\u Car\u F
我忘了给你贴标签:)@Ben。T@M-I’我不确定我是否理解你在决赛中想要什么。我明白了,对于Michel,您希望在事件列中添加一行驱动程序名称,但是对于Franck,您也希望添加一行吗?