Python 熊猫：如何在保持列成对的同时按列组取消堆叠_Python_Pandas_Dataframe_Pivot_Pandas Groupby

Python 熊猫：如何在保持列成对的同时按列组取消堆叠

python pandas dataframe

Python 熊猫：如何在保持列成对的同时按列组取消堆叠,python,pandas,dataframe,pivot,pandas-groupby,Python,Pandas,Dataframe,Pivot,Pandas Groupby,我需要取消堆叠联系人列表（id、亲戚、电话号码…），以便列保持特定顺序给定一个索引，dataframe UNSTACK通过逐个取消单列的堆栈操作，即使应用于两列也是如此数据有 df_have=pd.DataFrame.from_dict（{'ID'：{0:'100'，1:'100'，2:'100'，3:'100'，4:'100'，5:'200'，6:'200'，7:'200'，8:'200'，9:'200'}， 'ID_RELATIVE'：{0:'100'，1:'100'，2:'150'，3

我需要取消堆叠联系人列表（id、亲戚、电话号码…），以便列保持特定顺序

给定一个索引，dataframe UNSTACK通过逐个取消单列的堆栈操作，即使应用于两列也是如此

数据有

df_have=pd.DataFrame.from_dict（{'ID'：{0:'100'，1:'100'，2:'100'，3:'100'，4:'100'，5:'200'，6:'200'，7:'200'，8:'200'，9:'200'}，
'ID_RELATIVE'：{0:'100'，1:'100'，2:'150'，3:'150'，4:'190'，5:'200'，6:'200'，7:'250'，8:'290'，9:'290'}，
“相对角色”：{0:'自我'，1:'自我'，2:'父亲'，3:'父亲'，4:'母亲'，5:'自我'，6:'自我'，7:'父亲'，8:'母亲'，9:'母亲'，
电话：{0:'111111'，1:'2222222'，2:'333333'，3:'444444'，4:'555555'，5:'123456'，6:'456789'，7:'987654'，8:'778899'，9:'909090'}）

数据需求

df_want=pd.DataFrame.from_dict（{'ID'：{0'100'，1'200'}，
'ID_RELATIVE_1'：{0:'100'，1:'200'}，
'RELATIVE_ROLE_1'：{0:'self'，1:'self'}，
'PHONE_1_1'：{0:'111111'，1:'123456'}，
‘PHONE_1_2’：{0:'222222'，1:'456789'}，
'ID_RELATIVE_2'：{0:'150'，1:'250'}，
'RELATIVE_ROLE_2'：{0:'父亲'，1:'父亲'}，
'PHONE_2_1'：{0:'333333'，1:'987654'}，
'PHONE_2_2'：{0:'444444'，1:'nan'}，
'ID_RELATIVE_3'：{0:'190'，1:'290'}，
'RELATIVE_ROLE_3'：{0:'母亲'，1:'母亲'}，
'PHONE_3_1'：{0:'555555'，1:'778899'}，
'PHONE_3_2'：{0:'nan'，1:'909090'}）

所以，最后，我需要ID作为索引，并取消堆叠其他列，这些列将因此成为ID的属性

通常的取消堆叠过程提供“正确”输出，但形状错误

df2=have.groupby（['ID']）['ID\u RELATIVE'、'RELATIVE\u ROLE'、'PHONE'].apply（lambda x:x.reset\u index（drop=True））.unstack（）

这需要对列进行重新排序，并删除一些重复项（按列，而不是按行），以及FOR循环。我希望避免使用这种方法，因为我正在寻找一种更“优雅”的方法，通过分组/堆叠/取消堆叠/旋转等方式来实现所需的结果

非常感谢

解决方案有两个主要步骤-首先按所有列分组，无需电话配对，将列名称转换为有序分类，以便正确排序，然后按

ID

分组：

c = ['ID','ID_RELATIVE','RELATIVE_ROLE']
df = df_have.set_index(c+ [df_have.groupby(c).cumcount().add(1)])['PHONE']
df = df.unstack().add_prefix('PHONE_').reset_index()

df = df.set_index(['ID', df.groupby('ID').cumcount().add(1)])

df.columns = pd.CategoricalIndex(df.columns, categories=df.columns.tolist(), ordered=True)

df = df.unstack().sort_index(axis=1, level=1)

如果需要更改

PHONE

列中的数字顺序：

df.columns = [f'{a.split("_")[0]}_{b}_{a.split("_")[1]}' 
                 if 'PHONE' in a 
                 else f'{a}_{b}' for a, b in df.columns]    
df = df.reset_index()
print (df)
    ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_1_2 ID_RELATIVE_2  \
0  100           100            self    111111    222222           150   
1  200           200            self    123456    456789           250   

  RELATIVE_ROLE_2 PHONE_2_1 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_3_1  \
0          father    333333    444444           190          mother    555555   
1          father    987654       NaN           290          mother    778899   

  PHONE_3_2  
0       NaN  
1    909090

df.columns = [f'{a.split("_")[0]}_{b}_{a.split("_")[1]}' 
                 if 'PHONE' in a 
                 else f'{a}_{b}' for a, b in df.columns]    
df = df.reset_index()
print (df)
    ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_1_2 ID_RELATIVE_2  \
0  100           100            self    111111    222222           150   
1  200           200            self    123456    456789           250   

  RELATIVE_ROLE_2 PHONE_2_1 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_3_1  \
0          father    333333    444444           190          mother    555555   
1          father    987654       NaN           290          mother    778899   

  PHONE_3_2  
0       NaN  
1    909090