Python 根据数据帧中是否存在行（按组ID）添加行？_Python_Pandas

Python 根据数据帧中是否存在行（按组ID）添加行？

python pandas

Python 根据数据帧中是否存在行（按组ID）添加行？,python,pandas,Python,Pandas,我有这样一个数据集： g_id event time_left home away 1 "TIP" 00:12:00 8 6 1 "SHOT" 00:11:48 8 6 1 "MISS" 00:11:20 8 6 1 "TOV" 00:11:15 8 6 1 "SHOT" 00:10:40 8 6 2 "REB" 00:11:48 7

我有这样一个数据集：

g_id    event   time_left  home away
1       "TIP"   00:12:00   8    6
1       "SHOT"  00:11:48   8    6
1       "MISS"  00:11:20   8    6
1       "TOV"   00:11:15   8    6
1       "SHOT"  00:10:40   8    6
2       "REB"   00:11:48   7    3
2       "FOUL"  00:11:35   7    3
2       "FT"    00:11:33   7    3
2       "FT"    00:11:31   7    3
3       "TIP"   00:12:00   5    1
3       "MISS"  00:11:43   5    1
3       "REB"   00:11:42   5    1
3       "SHOT"  00:11:27   5    1
3       "TOV"   00:11:04   5    1 
4       "SHOT"  00:11:39   9    4
4       "MISS"  00:11:17   9    4
4       "REB"   00:11:16   9    4
4       "SHOT"  00:10:58   9    4

我注意到我的问题有点类似于，但我想知道这是否也可以在熊猫身上实现。正如您可能已经注意到的，数据是按“g_id”分组的，一些序列以“TIP”开头，而另一些序列则不以“TIP”开头。我想做的是按“g_id”进行，如果“g_id”不是以event='TIP'开头，则在该列中插入一行包含'TIP'，在'time_left'列中插入'00:12:00'，并将第一行中的'home'和'away'列结转。我该怎么做？真正的数据集有更多的列，但我基本上只需要插入一个新行，其中一些列值与前面的行相同，一些被分配了新值。

您可以迭代组并检查第一个事件是否为TIP，然后使用

series.shift

和

pd.concat

，您可以添加第一行并将最后一行追加回：

l = [pd.concat((g.shift().fillna({'event':'"TIP"','time_left':'00:12:00'}).bfill(),
                                                            g.iloc[[-1]])) 
   if 'TIP' not in g['event'].iloc[0] else g for _,g in df.groupby('g_id')]

out = pd.concat(l,ignore_index=True)
print(out)

稍长一点的解决方案。您可以通过

    g_ids = df['g_id'].unique()

这个示例将返回一个数组[1,2,3,4]

    for g_id in g_ids:
        events = df[df['g_id'] == g_id]['event']
        if 'TIP' not in events:
            insert_index = len(df.index)
            copy_row_index = df.iloc[df['g_id'].ne(g_id).idxmax()]
            df.loc[insert_index] = df[df['g_id'] == g_id].iloc[0]
            df.loc[insert_index]['event'] == 'TIP'
    df.sort_values(by=['g_id'], inplace=True)

它给了我一个语法错误，因为括号，不知道为什么，我复制了行exactly@SmallChimp请现在再试，有一个输入错误对不起

    for g_id in g_ids:
        events = df[df['g_id'] == g_id]['event']
        if 'TIP' not in events:
            insert_index = len(df.index)
            copy_row_index = df.iloc[df['g_id'].ne(g_id).idxmax()]
            df.loc[insert_index] = df[df['g_id'] == g_id].iloc[0]
            df.loc[insert_index]['event'] == 'TIP'
    df.sort_values(by=['g_id'], inplace=True)