Python 熊猫多索引数据帧中的旋转数据_Python_Pandas_Multi Index

Python 熊猫多索引数据帧中的旋转数据

python pandas

Python 熊猫多索引数据帧中的旋转数据,python,pandas,multi-index,Python,Pandas,Multi Index,我有一个多索引数据框，看起来，这只是部分。这一年从2007年到2015年，每年都有相同的地点 Jan Feb Mar Apr May June July Aug Sept Oct \ Year Place 2007 Johore 1.26 1.07 1.21 1.27 1.33

我有一个多索引数据框，看起来，这只是部分。这一年从2007年到2015年，每年都有相同的地点

                Jan   Feb   Mar   Apr   May  June  July   Aug  Sept   Oct  \
Year Place                                                                     
2007 Johore       1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88  1.89  1.86   
     Kedah        1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09  2.08  2.02   
     Kelantan     0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02  1.08  1.17   
     Malacca      1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80  2.03  2.14   
     N. Sembilan  0.98  0.94  1.11  1.07  1.10  1.16  1.46  1.58  1.61  1.71   

                   Nov   Dec  
Year Place                    
2007 Johore       1.95  1.72  
     Kedah        1.79  1.39  
     Kelantan     1.29  0.97  
     Malacca      2.44  2.13  
     N. Sembilan  1.75  1.58

我想旋转数据，得到一个索引为月份（例如2007年1月、2007年2月）的单索引数据框，列为不同的位置

我试着以“彭亨”为例，并做到了：

In [14]:

Pahang=df.xs('Pahang',level='Place')
In [15]:

Pahang.unstack().unstack().unstack()
Out[15]:
Year      
2007  Jan     1.19
      Feb     1.01
      Mar     1.13
      Apr     1.19
      May     1.24
      June    1.17
      July    1.43
      Aug     1.59
      Sept    1.63
      Oct     1.64
      Nov     1.82
      Dec     1.31
2008  Jan     1.57
      Feb     1.36
      Mar     1.56
...
2014  Oct     1.87
      Nov     1.74
      Dec     1.09
2015  Jan     0.93
      Feb     1.02
      Mar     1.28
      Apr     1.51
      May      NaN
      June     NaN
      July     NaN
      Aug      NaN
      Sept     NaN
      Oct      NaN
      Nov      NaN
      Dec      NaN
Length: 108, dtype: float64

我得到了我想要的彭亨专栏。我想知道是否有一种方法可以更快地在所有地方循环，而不是一次只在一个地方进行。

谢谢

您应该使用

pandas.pivot

。例如：

您可以使用以下方法来交换索引级别，而不是使用

unstack

：

如果您的多索引具有多个级别，并且需要执行多个交换，那么还有一种方法非常有用。

我的想法大致与@HappyLeapSecond相同，但将添加这一点，因为它与@HappyLeapSecond不完全相同，而且更通用（适用于所有行，而不仅仅是特定行）

首先，我将使用稍微不同的示例数据集。还要注意的是，我在发布时没有使用多索引，因为单级索引更容易复制和粘贴到熊猫中

   year     place   Jan   Feb   Mar   Apr   May  June  July   Aug
0  2007    Johore  1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88
1  2007     Kedah  1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09
2  2007  Kelantan  0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02
3  2007   Malacca  1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80
4  2008    Johore  1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88
5  2008     Kedah  1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09
6  2008  Kelantan  0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02
7  2008   Malacca  1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80

然后，设置索引，使其与问题中的索引具有可比性：

df = df.reset_index(drop=True).set_index(['year','place'])

                Jan   Feb   Mar   Apr   May  June  July   Aug
year place                                                   
2007 Johore    1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88
     Kedah     1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09
     Kelantan  0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02
     Malacca   1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80
2008 Johore    1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88
     Kedah     1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09
     Kelantan  0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02
     Malacca   1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80

然后是一些切换、求助等。数据中的主要“问题”是，行轴以年开始，列轴以月开始。因此，您需要做的是将年份索引从行移到列。这是通过

取消堆栈（level='year'）

完成的。剩下的基本上只是清理的问题

df.unstack(level='year').swaplevel(0,1,axis=1).T.sortlevel(0)

place      Johore  Kedah  Kelantan  Malacca
year                                       
2007 Jan     1.26   1.20      0.92     1.62
     Feb     1.07   1.27      0.90     1.45
     Mar     1.21   1.50      1.01     1.64
     Apr     1.27   1.38      1.10     1.52
     May     1.33   1.38      1.07     1.50
     June    1.28   1.52      0.87     1.40
     July    1.67   1.84      0.93     1.75
     Aug     1.88   2.09      1.02     1.80
2008 Jan     1.26   1.20      0.92     1.62
     Feb     1.07   1.27      0.90     1.45
     Mar     1.21   1.50      1.01     1.64
     Apr     1.27   1.38      1.10     1.52
     May     1.33   1.38      1.07     1.50
     June    1.28   1.52      0.87     1.40
     July    1.67   1.84      0.93     1.75
     Aug     1.88   2.09      1.02     1.80

编辑以添加：使用@JianxunLi的解决方案可以简化最后一行

df.stack().unstack(level='place')

这是一种更好的方法，可以解决将年/月放在同一个索引上，而将位置放在另一个索引上的问题，但我将把这个答案暂时留在这里，以防看到其他方法和解释会有所帮助。

您可以对所有

位置进行重塑，然后只选择其中一个
import pandas as pd
import numpy as np

# your data
# ===================================
multi_index = pd.MultiIndex.from_product([np.arange(2007,2016,1), 'A B C D E'.split()], names=['Year', 'Place'])
df = pd.DataFrame( np.random.randn(45,12), columns='Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec'.split(), index=multi_index)

df


               Jan     Feb     Mar   ...       Oct     Nov     Dec
Year Place                           ...                          
2007 A     -0.1512  0.7274 -0.3218   ...    1.2547 -1.8408  1.2585
     B      0.0856 -1.0458 -1.1428   ...    1.0194  1.1958  0.4905
     C     -1.2021 -0.6989 -0.0486   ...   -0.8053 -0.4929  1.6475
     D     -1.9948 -0.3465  1.3036   ...   -0.2490  0.6285 -0.0568
     E      0.0928 -1.3905  0.7203   ...   -0.1138  2.9552 -0.0272
2008 A     -1.2595  1.3072  0.6121   ...   -1.4275  0.8769  2.0671
     B      0.3611 -0.4187 -2.9609   ...   -1.2944  1.2752 -0.0947
     C      1.6492  0.0340 -0.9743   ...    0.0550  1.4135  0.8862
     D      0.9034 -0.2957  0.2152   ...    1.0947 -0.2405  0.0367
     E      0.9566  1.1927  0.0852   ...    0.7396  0.8240 -1.6628
...            ...     ...     ...   ...       ...     ...     ...
2014 A      0.7478 -0.8905  0.6238   ...   -1.0907 -0.2919  0.3261
     B      3.6764 -0.0601  1.2751   ...    0.3294 -1.3375 -1.5087
     C      2.3460 -0.4181  0.0607   ...   -0.8270  0.0536 -0.4353
     D      0.9733 -0.6863  0.5278   ...   -1.8206  0.4788  1.1438
     E     -0.3514  2.4570 -0.8567   ...    1.3434 -1.5634 -0.9984
2015 A      1.2849 -1.0657 -0.1173   ...   -0.1733  0.0441  0.0922
     B      0.5802 -0.5912  1.1193   ...   -0.1296 -0.6374 -1.7727
     C     -0.5026 -1.3111 -0.5499   ...    0.7308  1.2570  0.8733
     D     -1.6482 -0.2213  0.3336   ...   -1.3141 -2.0377 -1.1468
     E     -2.0796 -0.2808 -1.4079   ...   -0.3052  0.7999  0.3516

[45 rows x 12 columns]

# processing
# ==================================
res = df.stack().unstack(level='Place')

Place           A       B       C       D       E
Year                                             
2007 Jan  -0.1512  0.0856 -1.2021 -1.9948  0.0928
     Feb   0.7274 -1.0458 -0.6989 -0.3465 -1.3905
     Mar  -0.3218 -1.1428 -0.0486  1.3036  0.7203
     Apr  -1.4641  2.0384  0.6518  0.8756 -1.4627
     May  -0.8896 -1.6627  0.6990  0.2008  0.7423
     June -0.5339 -0.6629  0.1121  0.3618  1.3838
     July -0.4851  0.6544  0.5251  0.3394 -0.7016
     Aug  -1.2445  0.9671 -1.0684 -0.4776 -0.2936
     Sept  1.1330 -0.7543  1.6029  0.5543  0.3234
     Oct   1.2547  1.0194 -0.8053 -0.2490 -0.1138
...           ...     ...     ...     ...     ...
2015 Mar  -0.1173  1.1193 -0.5499  0.3336 -1.4079
     Apr  -1.0528  0.2421  0.3419 -2.1137 -0.2836
     May  -1.0709 -0.1794 -0.2682 -0.3226  0.8654
     June -1.4538 -0.7313  0.3177 -1.4008  1.1357
     July -1.6210 -0.3815 -0.9876  0.1019  1.7450
     Aug   0.5692  0.7679  1.1893 -0.9612  0.0903
     Sept  0.2371  0.6740  0.9204 -0.2909 -0.8197
     Oct  -0.1733 -0.1296  0.7308 -1.3141 -0.3052
     Nov   0.0441 -0.6374  1.2570 -2.0377  0.7999
     Dec   0.0922 -1.7727  0.8733 -1.1468  0.3516

[108 rows x 5 columns]


# select one place
res['A']

Year      
2007  Jan    -0.1512
      Feb     0.7274
      Mar    -0.3218
      Apr    -1.4641
      May    -0.8896
      June   -0.5339
      July   -0.4851
      Aug    -1.2445
      Sept    1.1330
      Oct     1.2547
               ...  
2015  Mar    -0.1173
      Apr    -1.0528
      May    -1.0709
      June   -1.4538
      July   -1.6210
      Aug     0.5692
      Sept    0.2371
      Oct    -0.1733
      Nov     0.0441
      Dec     0.0922
Name: A, dtype: float64

在上一篇文章之后，您在格式化文章方面做得很好。：-）为了让它变得更好，最好是包含一个可复制的示例。这可以是一个代码片段，用于模拟与真实数据集具有相同结构的一些人工数据，也可以通过df.to_csv（）
将csv
字符串上载到df.to_csv（），以便其他人能够以最小的工作量快速复制您的数据集。：-）这个答案在细节上有点含糊不清，但pivot或pivot_表将是解决这个问题的另一个好方法。
import pandas as pd
import numpy as np

# your data
# ===================================
multi_index = pd.MultiIndex.from_product([np.arange(2007,2016,1), 'A B C D E'.split()], names=['Year', 'Place'])
df = pd.DataFrame( np.random.randn(45,12), columns='Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec'.split(), index=multi_index)

df


               Jan     Feb     Mar   ...       Oct     Nov     Dec
Year Place                           ...                          
2007 A     -0.1512  0.7274 -0.3218   ...    1.2547 -1.8408  1.2585
     B      0.0856 -1.0458 -1.1428   ...    1.0194  1.1958  0.4905
     C     -1.2021 -0.6989 -0.0486   ...   -0.8053 -0.4929  1.6475
     D     -1.9948 -0.3465  1.3036   ...   -0.2490  0.6285 -0.0568
     E      0.0928 -1.3905  0.7203   ...   -0.1138  2.9552 -0.0272
2008 A     -1.2595  1.3072  0.6121   ...   -1.4275  0.8769  2.0671
     B      0.3611 -0.4187 -2.9609   ...   -1.2944  1.2752 -0.0947
     C      1.6492  0.0340 -0.9743   ...    0.0550  1.4135  0.8862
     D      0.9034 -0.2957  0.2152   ...    1.0947 -0.2405  0.0367
     E      0.9566  1.1927  0.0852   ...    0.7396  0.8240 -1.6628
...            ...     ...     ...   ...       ...     ...     ...
2014 A      0.7478 -0.8905  0.6238   ...   -1.0907 -0.2919  0.3261
     B      3.6764 -0.0601  1.2751   ...    0.3294 -1.3375 -1.5087
     C      2.3460 -0.4181  0.0607   ...   -0.8270  0.0536 -0.4353
     D      0.9733 -0.6863  0.5278   ...   -1.8206  0.4788  1.1438
     E     -0.3514  2.4570 -0.8567   ...    1.3434 -1.5634 -0.9984
2015 A      1.2849 -1.0657 -0.1173   ...   -0.1733  0.0441  0.0922
     B      0.5802 -0.5912  1.1193   ...   -0.1296 -0.6374 -1.7727
     C     -0.5026 -1.3111 -0.5499   ...    0.7308  1.2570  0.8733
     D     -1.6482 -0.2213  0.3336   ...   -1.3141 -2.0377 -1.1468
     E     -2.0796 -0.2808 -1.4079   ...   -0.3052  0.7999  0.3516

[45 rows x 12 columns]

# processing
# ==================================
res = df.stack().unstack(level='Place')

Place           A       B       C       D       E
Year                                             
2007 Jan  -0.1512  0.0856 -1.2021 -1.9948  0.0928
     Feb   0.7274 -1.0458 -0.6989 -0.3465 -1.3905
     Mar  -0.3218 -1.1428 -0.0486  1.3036  0.7203
     Apr  -1.4641  2.0384  0.6518  0.8756 -1.4627
     May  -0.8896 -1.6627  0.6990  0.2008  0.7423
     June -0.5339 -0.6629  0.1121  0.3618  1.3838
     July -0.4851  0.6544  0.5251  0.3394 -0.7016
     Aug  -1.2445  0.9671 -1.0684 -0.4776 -0.2936
     Sept  1.1330 -0.7543  1.6029  0.5543  0.3234
     Oct   1.2547  1.0194 -0.8053 -0.2490 -0.1138
...           ...     ...     ...     ...     ...
2015 Mar  -0.1173  1.1193 -0.5499  0.3336 -1.4079
     Apr  -1.0528  0.2421  0.3419 -2.1137 -0.2836
     May  -1.0709 -0.1794 -0.2682 -0.3226  0.8654
     June -1.4538 -0.7313  0.3177 -1.4008  1.1357
     July -1.6210 -0.3815 -0.9876  0.1019  1.7450
     Aug   0.5692  0.7679  1.1893 -0.9612  0.0903
     Sept  0.2371  0.6740  0.9204 -0.2909 -0.8197
     Oct  -0.1733 -0.1296  0.7308 -1.3141 -0.3052
     Nov   0.0441 -0.6374  1.2570 -2.0377  0.7999
     Dec   0.0922 -1.7727  0.8733 -1.1468  0.3516

[108 rows x 5 columns]


# select one place
res['A']

Year      
2007  Jan    -0.1512
      Feb     0.7274
      Mar    -0.3218
      Apr    -1.4641
      May    -0.8896
      June   -0.5339
      July   -0.4851
      Aug    -1.2445
      Sept    1.1330
      Oct     1.2547
               ...  
2015  Mar    -0.1173
      Apr    -1.0528
      May    -1.0709
      June   -1.4538
      July   -1.6210
      Aug     0.5692
      Sept    0.2371
      Oct    -0.1733
      Nov     0.0441
      Dec     0.0922
Name: A, dtype: float64