Python 根据数据帧中是否存在行(按组ID)添加行?
我有这样一个数据集:Python 根据数据帧中是否存在行(按组ID)添加行?,python,pandas,Python,Pandas,我有这样一个数据集: g_id event time_left home away 1 "TIP" 00:12:00 8 6 1 "SHOT" 00:11:48 8 6 1 "MISS" 00:11:20 8 6 1 "TOV" 00:11:15 8 6 1 "SHOT" 00:10:40 8 6 2 "REB" 00:11:48 7
g_id event time_left home away
1 "TIP" 00:12:00 8 6
1 "SHOT" 00:11:48 8 6
1 "MISS" 00:11:20 8 6
1 "TOV" 00:11:15 8 6
1 "SHOT" 00:10:40 8 6
2 "REB" 00:11:48 7 3
2 "FOUL" 00:11:35 7 3
2 "FT" 00:11:33 7 3
2 "FT" 00:11:31 7 3
3 "TIP" 00:12:00 5 1
3 "MISS" 00:11:43 5 1
3 "REB" 00:11:42 5 1
3 "SHOT" 00:11:27 5 1
3 "TOV" 00:11:04 5 1
4 "SHOT" 00:11:39 9 4
4 "MISS" 00:11:17 9 4
4 "REB" 00:11:16 9 4
4 "SHOT" 00:10:58 9 4
我注意到我的问题有点类似于,但我想知道这是否也可以在熊猫身上实现。正如您可能已经注意到的,数据是按“g_id”分组的,一些序列以“TIP”开头,而另一些序列则不以“TIP”开头。我想做的是按“g_id”进行,如果“g_id”不是以event='TIP'开头,则在该列中插入一行包含'TIP',在'time_left'列中插入'00:12:00',并将第一行中的'home'和'away'列结转。我该怎么做?真正的数据集有更多的列,但我基本上只需要插入一个新行,其中一些列值与前面的行相同,一些被分配了新值。您可以迭代组并检查第一个事件是否为TIP,然后使用
series.shift
和pd.concat
,您可以添加第一行并将最后一行追加回:
l = [pd.concat((g.shift().fillna({'event':'"TIP"','time_left':'00:12:00'}).bfill(),
g.iloc[[-1]]))
if 'TIP' not in g['event'].iloc[0] else g for _,g in df.groupby('g_id')]
out = pd.concat(l,ignore_index=True)
print(out)
稍长一点的解决方案。您可以通过
g_ids = df['g_id'].unique()
这个示例将返回一个数组[1,2,3,4]
for g_id in g_ids:
events = df[df['g_id'] == g_id]['event']
if 'TIP' not in events:
insert_index = len(df.index)
copy_row_index = df.iloc[df['g_id'].ne(g_id).idxmax()]
df.loc[insert_index] = df[df['g_id'] == g_id].iloc[0]
df.loc[insert_index]['event'] == 'TIP'
df.sort_values(by=['g_id'], inplace=True)
它给了我一个语法错误,因为括号,不知道为什么,我复制了行exactly@SmallChimp请现在再试,有一个输入错误对不起
for g_id in g_ids:
events = df[df['g_id'] == g_id]['event']
if 'TIP' not in events:
insert_index = len(df.index)
copy_row_index = df.iloc[df['g_id'].ne(g_id).idxmax()]
df.loc[insert_index] = df[df['g_id'] == g_id].iloc[0]
df.loc[insert_index]['event'] == 'TIP'
df.sort_values(by=['g_id'], inplace=True)