Python Pandas-插入缺少数据的行
我有一个数据集,下面是一个示例:Python Pandas-插入缺少数据的行,python,pandas,Python,Pandas,我有一个数据集,下面是一个示例: df = DataFrame({"Seconds_left":[5,10,15,25,30,35,5,10,15,30], "Team":["ATL","ATL","ATL","ATL","ATL","ATL","SAS","SAS","SAS","SAS"], "Fouls": [1,2,3,3,4,5,5,4,1,1]}) Fouls Seconds_left Team 0 1 5 ATL 1 2
df = DataFrame({"Seconds_left":[5,10,15,25,30,35,5,10,15,30], "Team":["ATL","ATL","ATL","ATL","ATL","ATL","SAS","SAS","SAS","SAS"], "Fouls": [1,2,3,3,4,5,5,4,1,1]})
Fouls Seconds_left Team
0 1 5 ATL
1 2 10 ATL
2 3 15 ATL
3 3 25 ATL
4 4 30 ATL
5 5 35 ATL
6 5 5 SAS
7 4 10 SAS
8 1 15 SAS
9 1 30 SAS
现在,我想插入左栏秒数数据缺失的行:
Id Fouls Seconds_left Team
0 1 5 ATL
1 2 10 ATL
2 3 15 ATL
3 NaN 20 ATL
4 3 25 ATL
5 4 30 ATL
6 5 35 ATL
7 5 5 SAS
8 4 10 SAS
9 1 15 SAS
10 NaN 20 SAS
11 NaN 25 SAS
12 1 30 SAS
13 NaN 35 SAS
我已经尝试过重新索引等,但显然它不起作用,因为有重复
有人知道怎么解决这个问题吗
谢谢 创建多索引并重新索引+重置索引:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['a', 'b'])
df.loc[len(df)] = [1,np.NaN]
idx = pd.MultiIndex.from_product([df['Team'].unique(),
np.arange(5, df['Seconds_left'].max()+1, 5)],
names=['Team', 'Seconds_left'])
df.set_index(['Team', 'Seconds_left']).reindex(idx).reset_index()
Out:
Team Seconds_left Fouls
0 ATL 5 1.0
1 ATL 10 2.0
2 ATL 15 3.0
3 ATL 20 NaN
4 ATL 25 3.0
5 ATL 30 4.0
6 ATL 35 5.0
7 SAS 5 5.0
8 SAS 10 4.0
9 SAS 15 1.0
10 SAS 20 NaN
11 SAS 25 NaN
12 SAS 30 1.0
13 SAS 35 NaN
使用
groupby
和merge
的方法:
df_left = pd.DataFrame({'Seconds_left':[5,10,15,20,25,30,35]})
df_out = df.groupby('Team', as_index=False).apply(lambda x: x.merge(df_left, how='right', on='Seconds_left'))
df_out['Team'] = df_out['Team'].fillna(method='ffill')
df_out = df_out.reset_index(drop=True).sort_values(by=['Team','Seconds_left'])
print(df_out)
输出:
Fouls Seconds_left Team
0 1.0 5 ATL
1 2.0 10 ATL
2 3.0 15 ATL
6 NaN 20 ATL
3 3.0 25 ATL
4 4.0 30 ATL
5 5.0 35 ATL
7 5.0 5 SAS
8 4.0 10 SAS
9 1.0 15 SAS
11 NaN 20 SAS
12 NaN 25 SAS
10 1.0 30 SAS
13 NaN 35 SAS
谢谢,但这只插入一行带有NaN的内容。问题是我的数据是不连续的,我希望两个团队在剩下的几秒钟内都有相同的数据。我不确定我是否理解你想在这里做什么。你能用输入解释一下你想要的输出和逻辑吗?