Python 熊猫:如何在保持列成对的同时按列组取消堆叠

Python 熊猫:如何在保持列成对的同时按列组取消堆叠,python,pandas,dataframe,pivot,pandas-groupby,Python,Pandas,Dataframe,Pivot,Pandas Groupby,我需要取消堆叠联系人列表(id、亲戚、电话号码…),以便列保持特定顺序 给定一个索引,dataframe UNSTACK通过逐个取消单列的堆栈操作,即使应用于两列也是如此 数据有 df_have=pd.DataFrame.from_dict({'ID':{0:'100',1:'100',2:'100',3:'100',4:'100',5:'200',6:'200',7:'200',8:'200',9:'200'}, 'ID_RELATIVE':{0:'100',1:'100',2:'150',3

我需要取消堆叠联系人列表(id、亲戚、电话号码…),以便列保持特定顺序

给定一个索引,dataframe UNSTACK通过逐个取消单列的堆栈操作,即使应用于两列也是如此

数据有

df_have=pd.DataFrame.from_dict({'ID':{0:'100',1:'100',2:'100',3:'100',4:'100',5:'200',6:'200',7:'200',8:'200',9:'200'},
'ID_RELATIVE':{0:'100',1:'100',2:'150',3:'150',4:'190',5:'200',6:'200',7:'250',8:'290',9:'290'},
“相对角色”:{0:'自我',1:'自我',2:'父亲',3:'父亲',4:'母亲',5:'自我',6:'自我',7:'父亲',8:'母亲',9:'母亲',
电话:{0:'111111',1:'2222222',2:'333333',3:'444444',4:'555555',5:'123456',6:'456789',7:'987654',8:'778899',9:'909090'})
数据需求

df_want=pd.DataFrame.from_dict({'ID':{0'100',1'200'},
'ID_RELATIVE_1':{0:'100',1:'200'},
'RELATIVE_ROLE_1':{0:'self',1:'self'},
'PHONE_1_1':{0:'111111',1:'123456'},
‘PHONE_1_2’:{0:'222222',1:'456789'},
'ID_RELATIVE_2':{0:'150',1:'250'},
'RELATIVE_ROLE_2':{0:'父亲',1:'父亲'},
'PHONE_2_1':{0:'333333',1:'987654'},
'PHONE_2_2':{0:'444444',1:'nan'},
'ID_RELATIVE_3':{0:'190',1:'290'},
'RELATIVE_ROLE_3':{0:'母亲',1:'母亲'},
'PHONE_3_1':{0:'555555',1:'778899'},
'PHONE_3_2':{0:'nan',1:'909090'})
所以,最后,我需要ID作为索引,并取消堆叠其他列,这些列将因此成为ID的属性

通常的取消堆叠过程提供“正确”输出,但形状错误

df2=have.groupby(['ID'])['ID\u RELATIVE'、'RELATIVE\u ROLE'、'PHONE'].apply(lambda x:x.reset\u index(drop=True)).unstack()
这需要对列进行重新排序,并删除一些重复项(按列,而不是按行),以及FOR循环。我希望避免使用这种方法,因为我正在寻找一种更“优雅”的方法,通过分组/堆叠/取消堆叠/旋转等方式来实现所需的结果


非常感谢

解决方案有两个主要步骤-首先按所有列分组,无需电话配对,将列名称转换为有序分类,以便正确排序,然后按
ID
分组:

c = ['ID','ID_RELATIVE','RELATIVE_ROLE']
df = df_have.set_index(c+ [df_have.groupby(c).cumcount().add(1)])['PHONE']
df = df.unstack().add_prefix('PHONE_').reset_index()

df = df.set_index(['ID', df.groupby('ID').cumcount().add(1)])

df.columns = pd.CategoricalIndex(df.columns, categories=df.columns.tolist(), ordered=True)

df = df.unstack().sort_index(axis=1, level=1)

如果需要更改
PHONE
列中的数字顺序:

df.columns = [f'{a.split("_")[0]}_{b}_{a.split("_")[1]}' 
                 if 'PHONE' in a 
                 else f'{a}_{b}' for a, b in df.columns]    
df = df.reset_index()
print (df)
    ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_1_2 ID_RELATIVE_2  \
0  100           100            self    111111    222222           150   
1  200           200            self    123456    456789           250   

  RELATIVE_ROLE_2 PHONE_2_1 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_3_1  \
0          father    333333    444444           190          mother    555555   
1          father    987654       NaN           290          mother    778899   

  PHONE_3_2  
0       NaN  
1    909090  
df.columns = [f'{a.split("_")[0]}_{b}_{a.split("_")[1]}' 
                 if 'PHONE' in a 
                 else f'{a}_{b}' for a, b in df.columns]    
df = df.reset_index()
print (df)
    ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_1_2 ID_RELATIVE_2  \
0  100           100            self    111111    222222           150   
1  200           200            self    123456    456789           250   

  RELATIVE_ROLE_2 PHONE_2_1 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_3_1  \
0          father    333333    444444           190          mother    555555   
1          father    987654       NaN           290          mother    778899   

  PHONE_3_2  
0       NaN  
1    909090