Python 将数据帧的一部分转换为熊猫中的多索引_Python_Pandas_Xls

Python 将数据帧的一部分转换为熊猫中的多索引

python pandas

Python 将数据帧的一部分转换为熊猫中的多索引,python,pandas,xls,Python,Pandas,Xls,我有XLS格式的这种形式的数据： +--------+---------+-------------+---------------+---------+ | ID | Branch | Customer ID | Customer Name | Balance | +--------+---------+-------------+---------------+---------+ | 111111 | Branch1 | 1 | Company A |

我有XLS格式的这种形式的数据：

+--------+---------+-------------+---------------+---------+
|   ID   |  Branch | Customer ID | Customer Name | Balance |
+--------+---------+-------------+---------------+---------+
| 111111 | Branch1 | 1           | Company A     | 10      |
+--------+---------+-------------+---------------+---------+
| 222222 | Branch2 | 2           | Company B     | 20      |
+--------+---------+-------------+---------------+---------+
| 111111 | Branch1 | 2           | Company B     | 30      |
+--------+---------+-------------+---------------+---------+
| 222222 | Branch2 | 3           | Company C     | 10      |
+--------+---------+-------------+---------------+---------+

我想用熊猫来处理它。熊猫会把它当作一张纸来读，但我想在这里使用多索引，比如

+--------+---------+-------------+---------------+---------+
|   ID   |  Branch | Customer ID | Customer Name | Balance |
+--------+---------+-------------+---------------+---------+
|        |         | 1           | Company A     | 10      |
+ 111111 + Branch1 +-------------+---------------+---------+
|        |         | 2           | Company B     | 30      |
+--------+---------+-------------+---------------+---------+
|        |         | 2           | Company B     | 20      |
+ 222222 + Branch2 +-------------+---------------+---------+
|        |         | 3           | Company C     | 10      |
+--------+---------+-------------+---------------+---------+

这里

和

Branch1

是一级索引，

公司A

是二级索引。有内置的方法吗？

如果只需要，请使用：

但如果在

多索引中只需要两个级别（在我的解决方案中，a
，b
），则需要将第一列与第二列连接，第三列与第四列连接：
df['a'] = df.ID.astype(str) + '_' + df.Branch
df['b'] = df['Customer ID'].astype(str) + '_' + df['Customer Name']
#delete original columns
df.drop(['ID','Branch', 'Customer ID','Customer Name'], axis=1, inplace=True)

df.set_index(['a','b'], inplace=True)
df.sort_index(inplace=True)
print (df)
                            Balance
a              b                   
111111_Branch1 1_Company A       10
               2_Company B       30
222222_Branch2 2_Company B       20
               3_Company C       10

如果需要按前几列聚合最后一列，请与以下一起使用：

如果在列中使用多索引
，则需要元组
：
df['a'] = df.ID.astype(str) + '_' + df.Branch
df['b'] = df['Customer ID'].astype(str) + '_' + df['Customer Name']
#delete original columns
df.drop(['ID','Branch', 'Customer ID','Customer Name'], axis=1, inplace=True)

df.set_index(['a','b'], inplace=True)
df.sort_index(inplace=True)
print (df)
                            Balance
a              b                   
111111_Branch1 1_Company A       10
               2_Company B       30
222222_Branch2 2_Company B       20
               3_Company C       10

df = df.groupby(['ID','Branch', 'Customer ID','Customer Name'])['Balance'].mean().to_frame()
print (df)
                                          Balance
ID     Branch  Customer ID Customer Name         
111111 Branch1 1           Company A           10
               2           Company B           30
222222 Branch2 2           Company B           20
               3           Company C           10

df.columns = pd.MultiIndex.from_arrays([['a'] * 2 + ['b']* 2 + ['c'], df.columns])
print (df)
        a                    b                     c
       ID   Branch Customer ID Customer Name Balance
0  111111  Branch1           1     Company A      10
1  222222  Branch2           2     Company B      20
2  111111  Branch1           2     Company B      30
3  222222  Branch2           3     Company C      10

df.set_index([('a','ID'), ('a','Branch'), 
              ('b','Customer ID'), ('b','Customer Name')], inplace=True)
df.sort_index(inplace=True)
print (df)
                                                              c
                                                        Balance
(a, ID) (a, Branch) (b, Customer ID) (b, Customer Name)        
111111  Branch1     1                Company A               10
                    2                Company B               30
222222  Branch2     2                Company B               20
                    3                Company C               10