Python 为每个变量取消具有多种观测类型的数据帧堆叠_Python_Pandas

Python 为每个变量取消具有多种观测类型的数据帧堆叠

python pandas

Python 为每个变量取消具有多种观测类型的数据帧堆叠,python,pandas,Python,Pandas,给定这样的堆叠数据帧，其中每个变量有三种类型的观测： ID Variable Value 0 1056 Run Score 89 1 1056 Run Rank 56 2 1056 Run Decile 8 3 1056 Swim Score 92 4 1056 Swim Rank 64 5 1056 Swim Decile 8 6 1056 Cycle Score 96 7 1056 Cyc

给定这样的堆叠数据帧，其中每个变量有三种类型的观测：

     ID      Variable  Value
0  1056    Run Score   89
1  1056    Run Rank    56
2  1056    Run Decile  8
3  1056    Swim Score  92
4  1056    Swim Rank   64
5  1056    Swim Decile 8
6  1056    Cycle Score 96
7  1056    Cycle Rank  32
8  1056    Cycle Decile    9

我如何才能将其展开为这样：

Variable    ID  Decile  Rank  Score  Event
0         1056       8    56     89    Run
0         1056       8    64     92   Swim
0         1056       9    32     96  Cycle

我现在就是这样做的，但感觉太复杂了：

import pandas as pd

data = [(1056, "Run Score", 89),
    (1056, "Run Rank", 56),
    (1056, "Run Decile", 8),
    (1056, "Swim Score", 92),
    (1056, "Swim Rank", 64),
    (1056, "Swim Decile", 8),
    (1056, "Cycle Score", 96),
    (1056, "Cycle Rank", 32),
    (1056, "Cycle Decile", 9)]

cols = ["ID", "Variable", "Value"]

all_data = pd.DataFrame(data=data, columns=cols)

event_names = ["Run", "Swim", "Cycle"]

event_data_all = []

for event_name in event_names:
    event_data = all_data.loc[all_data["Variable"].str.startswith(event_name)]
    event_data = event_data.pivot_table(index="ID", columns="Variable", values="Value", aggfunc=pd.np.sum)
    event_data.reset_index(inplace=True)
    event_data.rename(columns={
        event_name + " Score": "Score",
        event_name + " Rank": "Rank",
        event_name + " Decile": "Decile"
    }, inplace=True)
    event_data["Event"] = event_name
    event_data_all.append(event_data)

all_data_final = pd.concat(event_data_all)

有更好的方法吗？

想法是创建新的两列，并通过以下方式使用它们进行旋转：

感谢@asongtoruin提供另一个解决方案，特别是在需要聚合数据时：

all_data.pivot_table(index=['ID', 'Event'], 
                     columns='b',
                     values='Value', 
                     aggfunc='sum').reset_index().rename_axis(None, 1))

另一种解决方案是通过

事件名称

：

event_names = ["Run", "Swim", "Cycle"]
all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
pat = '(' + '|'.join(event_names) + ')\s+(.*)'
all_data[['Event','b']] = all_data['Variable'].str.extract(pat)

df = (all_data.pivot_table(index=['ID', 'Event'], 
                          columns='b', 
                          values='Value', 
                          aggfunc='sum').reset_index().rename_axis(None, 1))
print (df)

     ID  Event  Decile  Rank  Score
0  1056  Cycle       9    32     96
1  1056    Run       8    56     89
2  1056   Swim       8    64     92

您的第二行也可以使用

all\u data.pivot\u table（index=['ID'，Event']，columns='Measure'，values='Value'）。reset\u index（）

对不起，那应该是

all\u data.pivot\u table（index=['ID'，Event']，columns='b'，values='Value'）。reset\u index（）

-我们对字符串拆分使用了不同的名称！那好多了！非常感谢。

event_names = ["Run", "Swim", "Cycle"]
all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
pat = '(' + '|'.join(event_names) + ')\s+(.*)'
all_data[['Event','b']] = all_data['Variable'].str.extract(pat)

df = (all_data.pivot_table(index=['ID', 'Event'], 
                          columns='b', 
                          values='Value', 
                          aggfunc='sum').reset_index().rename_axis(None, 1))
print (df)

     ID  Event  Decile  Rank  Score
0  1056  Cycle       9    32     96
1  1056    Run       8    56     89
2  1056   Swim       8    64     92