Python 将行数据转换为列数据_Python_Pandas

Python 将行数据转换为列数据

python pandas

Python 将行数据转换为列数据,python,pandas,Python,Pandas,我有一个数据帧，如： user_id category view collect 1 1 a 2 3 2 1 b 5 9 3 2 a 8 6 4 3 a 7 3 5 3 b 4 2 6 3 c 3 0 7 4 e 1 4 如何将其更改为新的dataframe，每个用户id可以出现一次，然后带有view和collect的类别显示在列中，如果没有数据，则用0填充，如下所示：

我有一个数据帧，如：

user_id category    view    collect
1   1   a   2   3
2   1   b   5   9
3   2   a   8   6
4   3   a   7   3
5   3   b   4   2
6   3   c   3   0
7   4   e   1   4

如何将其更改为新的dataframe，每个用户id可以出现一次，然后带有view和collect的类别显示在列中，如果没有数据，则用0填充，如下所示：

user_id a_view  a_collect   b_view  b_collect   c_view  c_collect   d_view  d_collect   e_view  e_collect
1   2   3   5   6   0   0   0   0   0   0
2   8   6   0   0   0   0   0   0   0   0
3   7   3   4   2   3   0   0   0   0   0
4   0   0   0   0   0   0   0   0   1   4

通过将

user\u id

中的值作为索引，将

category

中的值作为列级别，可以获得所需的结果：

import numpy as np
import pandas as pd
df = pd.DataFrame({'category': ['a', 'b', 'a', 'a', 'b', 'c', 'e'],
 'collect': [3, 9, 6, 3, 2, 0, 4],
 'user_id': [1, 1, 2, 3, 3, 3, 4],
 'view': [2, 5, 8, 7, 4, 3, 1]}) 

result = (df.pivot(index='user_id', columns='category')
          .swaplevel(axis=1).sortlevel(axis=1).fillna(0))

屈服

category    a            b            c            e        
         view collect view collect view collect view collect
user_id                                                     
1         2.0     3.0  5.0     9.0  0.0     0.0  0.0     0.0
2         8.0     6.0  0.0     0.0  0.0     0.0  0.0     0.0
3         7.0     3.0  4.0     2.0  3.0     0.0  0.0     0.0
4         0.0     0.0  0.0     0.0  0.0     0.0  1.0     4.0

         a_view  a_collect  b_view  b_collect  c_view  c_collect  e_view  \
user_id                                                                    
1           2.0        3.0     5.0        9.0     0.0        0.0     0.0   
2           8.0        6.0     0.0        0.0     0.0        0.0     0.0   
3           7.0        3.0     4.0        2.0     3.0        0.0     0.0   
4           0.0        0.0     0.0        0.0     0.0        0.0     1.0   

         e_collect  
user_id             
1              0.0  
2              0.0  
3              0.0  
4              4.0

上面，

result

有一个多索引。一般来说，我认为这应该优于扁平化的单一索引，因为它保留了更多的数据结构

但是，多索引可以展平为单个索引：

result.columns = ['{}_{}'.format(cat,col) for cat, col in result.columns]
print(result)

屈服

category    a            b            c            e        
         view collect view collect view collect view collect
user_id                                                     
1         2.0     3.0  5.0     9.0  0.0     0.0  0.0     0.0
2         8.0     6.0  0.0     0.0  0.0     0.0  0.0     0.0
3         7.0     3.0  4.0     2.0  3.0     0.0  0.0     0.0
4         0.0     0.0  0.0     0.0  0.0     0.0  1.0     4.0

         a_view  a_collect  b_view  b_collect  c_view  c_collect  e_view  \
user_id                                                                    
1           2.0        3.0     5.0        9.0     0.0        0.0     0.0   
2           8.0        6.0     0.0        0.0     0.0        0.0     0.0   
3           7.0        3.0     4.0        2.0     3.0        0.0     0.0   
4           0.0        0.0     0.0        0.0     0.0        0.0     1.0   

         e_collect  
user_id             
1              0.0  
2              0.0  
3              0.0  
4              4.0

TypeError:swaplevel（）至少接受3个参数（给定2个）谢谢，使用pandas版本0.18.1或更高版本将swaplevel（axis=1）更改为swaplevel（0,1，axis=1），swaplevel的前两个参数是可选的——默认情况下它交换最后两个级别。