Python 为每个变量取消具有多种观测类型的数据帧堆叠
给定这样的堆叠数据帧,其中每个变量有三种类型的观测:Python 为每个变量取消具有多种观测类型的数据帧堆叠,python,pandas,Python,Pandas,给定这样的堆叠数据帧,其中每个变量有三种类型的观测: ID Variable Value 0 1056 Run Score 89 1 1056 Run Rank 56 2 1056 Run Decile 8 3 1056 Swim Score 92 4 1056 Swim Rank 64 5 1056 Swim Decile 8 6 1056 Cycle Score 96 7 1056 Cyc
ID Variable Value
0 1056 Run Score 89
1 1056 Run Rank 56
2 1056 Run Decile 8
3 1056 Swim Score 92
4 1056 Swim Rank 64
5 1056 Swim Decile 8
6 1056 Cycle Score 96
7 1056 Cycle Rank 32
8 1056 Cycle Decile 9
我如何才能将其展开为这样:
Variable ID Decile Rank Score Event
0 1056 8 56 89 Run
0 1056 8 64 92 Swim
0 1056 9 32 96 Cycle
我现在就是这样做的,但感觉太复杂了:
import pandas as pd
data = [(1056, "Run Score", 89),
(1056, "Run Rank", 56),
(1056, "Run Decile", 8),
(1056, "Swim Score", 92),
(1056, "Swim Rank", 64),
(1056, "Swim Decile", 8),
(1056, "Cycle Score", 96),
(1056, "Cycle Rank", 32),
(1056, "Cycle Decile", 9)]
cols = ["ID", "Variable", "Value"]
all_data = pd.DataFrame(data=data, columns=cols)
event_names = ["Run", "Swim", "Cycle"]
event_data_all = []
for event_name in event_names:
event_data = all_data.loc[all_data["Variable"].str.startswith(event_name)]
event_data = event_data.pivot_table(index="ID", columns="Variable", values="Value", aggfunc=pd.np.sum)
event_data.reset_index(inplace=True)
event_data.rename(columns={
event_name + " Score": "Score",
event_name + " Rank": "Rank",
event_name + " Decile": "Decile"
}, inplace=True)
event_data["Event"] = event_name
event_data_all.append(event_data)
all_data_final = pd.concat(event_data_all)
有更好的方法吗?想法是创建新的两列,并通过以下方式使用它们进行旋转: 感谢@asongtoruin提供另一个解决方案,特别是在需要聚合数据时:
all_data.pivot_table(index=['ID', 'Event'],
columns='b',
values='Value',
aggfunc='sum').reset_index().rename_axis(None, 1))
另一种解决方案是通过事件名称
:
event_names = ["Run", "Swim", "Cycle"]
all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
pat = '(' + '|'.join(event_names) + ')\s+(.*)'
all_data[['Event','b']] = all_data['Variable'].str.extract(pat)
df = (all_data.pivot_table(index=['ID', 'Event'],
columns='b',
values='Value',
aggfunc='sum').reset_index().rename_axis(None, 1))
print (df)
ID Event Decile Rank Score
0 1056 Cycle 9 32 96
1 1056 Run 8 56 89
2 1056 Swim 8 64 92
您的第二行也可以使用
all\u data.pivot\u table(index=['ID',Event'],columns='Measure',values='Value')。reset\u index()
对不起,那应该是all\u data.pivot\u table(index=['ID',Event'],columns='b',values='Value')。reset\u index()
-我们对字符串拆分使用了不同的名称!那好多了!非常感谢。
event_names = ["Run", "Swim", "Cycle"]
all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
pat = '(' + '|'.join(event_names) + ')\s+(.*)'
all_data[['Event','b']] = all_data['Variable'].str.extract(pat)
df = (all_data.pivot_table(index=['ID', 'Event'],
columns='b',
values='Value',
aggfunc='sum').reset_index().rename_axis(None, 1))
print (df)
ID Event Decile Rank Score
0 1056 Cycle 9 32 96
1 1056 Run 8 56 89
2 1056 Swim 8 64 92