Python 熊猫从群比中归来
我的目标是根据特定列和特定类型获取数据并插值缺失的值 我实现了这个目标,但在插值之前,我很难回到数据帧的形状Python 熊猫从群比中归来,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我的目标是根据特定列和特定类型获取数据并插值缺失的值 我实现了这个目标,但在插值之前,我很难回到数据帧的形状 data = [ {"type": "Car", "avg_speed": 30, "max_speed": 200}, {"type": "Car", "avg_speed": 20, "max_speed": 100}, {"type": "Car", "avg_speed": 25, "max_speed": None}, {"type": "Pla
data = [
{"type": "Car", "avg_speed": 30, "max_speed": 200},
{"type": "Car", "avg_speed": 20, "max_speed": 100},
{"type": "Car", "avg_speed": 25, "max_speed": None},
{"type": "Plane", "avg_speed": 300, "max_speed": 2000},
{"type": "Plane", "avg_speed": 200, "max_speed": 1000},
{"type": "Plane", "avg_speed": 250, "max_speed": None}
]
df = pd.DataFrame(data)
print(df)
post_interp = df.groupby("type").apply(lambda x: x.set_index(
'avg_speed').sort_index().interpolate(method='index'))
print(post_interp)
首次印刷:
type avg_speed max_speed
0 Car 30 200.0
1 Car 20 100.0
2 Car 25 NaN
3 Plane 300 2000.0
4 Plane 200 1000.0
5 Plane 250 NaN
第二次印刷:
type max_speed
type avg_speed
Car 20 Car 100.0
25 Car 150.0
30 Car 200.0
Plane 200 Plane 1000.0
250 Plane 1500.0
300 Plane 2000.0
我想返回到带有插值的打印1中数据框的形状。添加
组键=False
以避免重复索引和上次添加:
另一种具有双重置索引的解决方案:
post_interp = (df.groupby("type")
.apply(lambda x: x.set_index('avg_speed')
.sort_index()
.interpolate(method='index'))
.reset_index(level=0, drop=True)
.reset_index())
或者您可以在groupby
之前创建索引:
post_interp = (df.set_index('avg_speed')
.sort_index()
.groupby("type", group_keys=False)
.apply(lambda x: x.interpolate(method='index'))
.reset_index())
print(post_interp)
avg_speed type max_speed
0 20 Car 100.0
1 25 Car 150.0
2 30 Car 200.0
3 200 Plane 1000.0
4 250 Plane 1500.0
5 300 Plane 2000.0
最后,如有必要,按相同的列顺序添加:
添加group\u keys=False
以避免重复索引和上次添加:
另一种具有双重置索引的解决方案:
post_interp = (df.groupby("type")
.apply(lambda x: x.set_index('avg_speed')
.sort_index()
.interpolate(method='index'))
.reset_index(level=0, drop=True)
.reset_index())
或者您可以在groupby
之前创建索引:
post_interp = (df.set_index('avg_speed')
.sort_index()
.groupby("type", group_keys=False)
.apply(lambda x: x.interpolate(method='index'))
.reset_index())
print(post_interp)
avg_speed type max_speed
0 20 Car 100.0
1 25 Car 150.0
2 30 Car 200.0
3 200 Plane 1000.0
4 250 Plane 1500.0
5 300 Plane 2000.0
最后,如有必要,按相同的列顺序添加:
是否要将新的max\u speed
和avg\u speed
列分配给原始数据帧?是否要将新的max\u speed
和avg\u speed
列分配给原始数据帧?