Python 3.x 在分类变量上旋转数据帧

Python 3.x 在分类变量上旋转数据帧,python-3.x,pandas,pivot,categorical-data,Python 3.x,Pandas,Pivot,Categorical Data,我有一个包含分类变量的数据框: {'SysID': {0: '00721778', 1: '00721778', 2: '00721778', 3: '00721779', 4: '00721779'}, 'SoftwareComponent': {0: 'AA13912', 1: 'AA24120', 2: 'AA21612', 3: 'AA30861', 4: 'AA20635'}, 'SoftwareSubcomponent': {0: None, 1: 'AK21431', 2: Non

我有一个包含分类变量的数据框:

{'SysID': {0: '00721778',
1: '00721778',
2: '00721778',
3: '00721779',
4: '00721779'},
'SoftwareComponent': {0: 'AA13912',
1: 'AA24120',
2: 'AA21612',
3: 'AA30861',
4: 'AA20635'},
'SoftwareSubcomponent': {0: None,
1: 'AK21431',
2: None,
3: 'AK22116',
4: None}}
我想通过忽略任何空值来关注分类变量。零应该是填充物。输出应如下所示:

{'SysID': {0: '00721778', 1: '00721779'},
'SoftwareCom-AA13912': {0: '1', 1: '0'},
'SoftwareCom-AA24120': {0: '1', 1: '0'},
'SoftwareCom-AA21612': {0: '1', 1: '0'},
'SoftwareCom-AA30861': {0: '0', 1: '1'},
'SoftwareCom-AA20635': {0: '0', 1: '1'},
'SoftwareSub-AK21431': {0: '1', 1: '0'},
'SoftwareSub-AK22116': {0: '0', 1: '1'}}

如何做到这一点?

在做了一些清理之后,您可以使用
pd.crosstab
。我们将堆叠(忽略所有
None
值)并创建列名,因为您希望将SofwareCom和SoftwareSub视为相同的

import pandas as pd

df = df.set_index('SysID').stack().reset_index(level=1)
df['val'] = df['level_1'].str[0:11] + '-' + df[0]

pd.crosstab(df.index, df.val).rename_axis('SysID', 0).rename_axis(None,1).reset_index()
输出:
如果您可能有多个计数,并且只需要1和0,那么您可以将类型转换为bool,然后返回int,或者只使用
.clip

pd.crosstab(df.index, df.val).rename_axis('SysID', 0).rename_axis(None,1).clip(0,1).reset_index()

您可以使用
pd.crosstab()
,然后在使用
pd.concat()
之前重命名数据帧列:

收益率:

          SoftwareComponent-AA13912  SoftwareComponent-AA20635  \
SysID                                                            
00721778                          1                          0   
00721779                          0                          1   

          SoftwareComponent-AA21612  SoftwareComponent-AA24120  \
SysID                                                            
00721778                          1                          1   
00721779                          0                          0   

          SoftwareComponent-AA30861  SoftwareSubcomponent-AK21431  \
SysID                                                               
00721778                          0                             1   
00721779                          1                             0   

          SoftwareSubcomponent-AK22116  
SysID                                   
00721778                             0  
00721779                             1 
使用
to_dict()
,您可以返回:

{'SoftwareComponent-AA13912': {'00721778': 1, '00721779': 0}, 'SoftwareComponent-AA20635': {'00721778': 0, '00721779': 1}, 'SoftwareComponent-AA21612': {'00721778': 1, '00721779': 0}, 'SoftwareComponent-AA24120': {'00721778': 1, '00721779': 0}, 'SoftwareComponent-AA30861': {'00721778': 0, '00721779': 1}, 'SoftwareSubcomponent-AK21431': {'00721778': 1, '00721779': 0}, 'SoftwareSubcomponent-AK22116': {'00721778': 0, '00721779': 1}}
          SoftwareComponent-AA13912  SoftwareComponent-AA20635  \
SysID                                                            
00721778                          1                          0   
00721779                          0                          1   

          SoftwareComponent-AA21612  SoftwareComponent-AA24120  \
SysID                                                            
00721778                          1                          1   
00721779                          0                          0   

          SoftwareComponent-AA30861  SoftwareSubcomponent-AK21431  \
SysID                                                               
00721778                          0                             1   
00721779                          1                             0   

          SoftwareSubcomponent-AK22116  
SysID                                   
00721778                             0  
00721779                             1 
{'SoftwareComponent-AA13912': {'00721778': 1, '00721779': 0}, 'SoftwareComponent-AA20635': {'00721778': 0, '00721779': 1}, 'SoftwareComponent-AA21612': {'00721778': 1, '00721779': 0}, 'SoftwareComponent-AA24120': {'00721778': 1, '00721779': 0}, 'SoftwareComponent-AA30861': {'00721778': 0, '00721779': 1}, 'SoftwareSubcomponent-AK21431': {'00721778': 1, '00721779': 0}, 'SoftwareSubcomponent-AK22116': {'00721778': 0, '00721779': 1}}