Python 从dataframe中的列创建多索引_Python_Pandas_Indexing

Python 从dataframe中的列创建多索引

python pandas indexing

Python 从dataframe中的列创建多索引,python,pandas,indexing,Python,Pandas,Indexing,我已经将数据导入到dataframe中，如下所示 VMGI US Equity VMGI Open VMGI High VMGI Low VMGI Px_last VMGI Volume SPOM US Equity SPOM Open SPOM High SPOM Low SPOM Px_last SPOM Volume Date 12/31/2012 12/31/2012 0.009 0.011 0.009

我已经将数据导入到dataframe中，如下所示

           VMGI US Equity   VMGI Open   VMGI High  VMGI Low  VMGI Px_last   VMGI Volume SPOM US Equity  SPOM Open   SPOM High   SPOM Low   SPOM Px_last SPOM Volume
Date
12/31/2012  12/31/2012      0.009       0.011      0.009         0.009      105726      12/31/2012      0.4575      0.4575      0.2925      0.3975       8890
1/1/2013    1/1/2013        0.009       0.011      0.009         0.009      105726      1/1/2013        0.4575      0.4575     0.2925      0.3975      8890
1/2/2013    1/2/2013        0.009       0.01       0.008         0.01       188150      1/2/2013        0.3975      0.3975     0.3225      0.3225      3400
1/3/2013    1/3/2013        0.011       0.018      0.011         0.015       169890     1/3/2013        0.34        0.3738     0.28        0.29       48933
1/4/2013    1/4/2013        0.015       0.018      0.014         0.018       33500      1/4/2013        0.36        0.4        0.3175      0.3175      3610

每第6列是一个新的股票。行数继续增加到1340行。我想在一个多索引中重新组织（我想）来创建这样的数据，因为我想为每个股票添加额外的列。我能够用以下代码获得股票名称

index2 =index1[0::6]     >>> which results in an object as follows (the first column for each stock) 
Index(['VMGI US Equity', 'SPOM US Equity', 'OPTL US Equity', 'FRHV US Equity', etc....

最终，我希望数据框架看起来像一个包含每只股票的指数

VMGI US Equity  VMGI US Equity  VMGI Open   VMGI High   VMGI Low    VMGI Px_last    VMGI Volume
                 12/31/2012       0.009      0.011      0.009       0.009            105726
                 1/1/2013         0.009      0.011      0.009       0.009            105726
                 1/2/2013         0.009      0.01       0.008       0.01             188150
                 1/3/2013         0.011      0.018      0.011       0.015          169890
                 1/4/2013         0.015      0.018      0.014       0.018          33500
SPOM US Equity  SPOM US Equity  SPOM Open   SPOM High   SPOM Low    SPOM Px_last    SPOM Volume
                12/31/2012       0.4575     0.4575      0.2925      0.3975          8890
                1/1/2013         0.4575     0.4575      0.2925      0.3975          8890

我已尝试设置索引，但出现以下错误

df2.index = df_clean_penny1.set_index(index2)
ValueError: Length mismatch: Expected axis has 1340 elements, new values have 65 elements

在其他文章中，我也尝试了MultiIndex.From_arrays（），但也无法使其工作。欢迎提供任何帮助/指导

您可以对

pd.Index

对象使用

str

访问器，并使用

split

和

expand=True

参数创建

pd.MultiIndex

df.columns = df.columns.str.split(' ', 1, expand=True)

然后可以堆叠刚刚创建的列索引的第一级

df.stack(0)

                   High     Low    Open  Px_last   US Equity  Volume
Date                                                                
12/31/2012 SPOM  0.4575  0.2925  0.4575   0.3975  12/31/2012    8890
           VMGI  0.0110  0.0090  0.0090   0.0090  12/31/2012  105726
1/1/2013   SPOM  0.4575  0.2925  0.4575   0.3975    1/1/2013    8890
           VMGI  0.0110  0.0090  0.0090   0.0090    1/1/2013  105726
1/2/2013   SPOM  0.3975  0.3225  0.3975   0.3225    1/2/2013    3400
           VMGI  0.0100  0.0080  0.0090   0.0100    1/2/2013  188150
1/3/2013   SPOM  0.3738  0.2800  0.3400   0.2900    1/3/2013   48933
           VMGI  0.0180  0.0110  0.0110   0.0150    1/3/2013  169890
1/4/2013   SPOM  0.4000  0.3175  0.3600   0.3175    1/4/2013    3610
           VMGI  0.0180  0.0140  0.0150   0.0180    1/4/2013   33500

在不就地编辑

列

对象的情况下，可以使用

设置轴

方法。从0.21版开始，现在接受允许流水线的

inplace=False

参数

df.set_axis(df.columns.str.split(' ', 1, expand=True), 1, 0).stack(0)

                   High     Low    Open  Px_last   US Equity  Volume
Date                                                                
12/31/2012 SPOM  0.4575  0.2925  0.4575   0.3975  12/31/2012    8890
           VMGI  0.0110  0.0090  0.0090   0.0090  12/31/2012  105726
1/1/2013   SPOM  0.4575  0.2925  0.4575   0.3975    1/1/2013    8890
           VMGI  0.0110  0.0090  0.0090   0.0090    1/1/2013  105726
1/2/2013   SPOM  0.3975  0.3225  0.3975   0.3225    1/2/2013    3400
           VMGI  0.0100  0.0080  0.0090   0.0100    1/2/2013  188150
1/3/2013   SPOM  0.3738  0.2800  0.3400   0.2900    1/3/2013   48933
           VMGI  0.0180  0.0110  0.0110   0.0150    1/3/2013  169890
1/4/2013   SPOM  0.4000  0.3175  0.3600   0.3175    1/4/2013    3610
           VMGI  0.0180  0.0140  0.0150   0.0180    1/4/2013   33500

更进一步，我们可以交换索引和排序的级别以改进布局

df.set_axis(df.columns.str.split(' ', 1, expand=True), 1, 0).stack(0) \
    .swaplevel(0, 1).sort_index().reindex(df.index, level=1)

                   High     Low    Open  Px_last   US Equity  Volume
     Date                                                           
SPOM 12/31/2012  0.4575  0.2925  0.4575   0.3975  12/31/2012    8890
     1/1/2013    0.4575  0.2925  0.4575   0.3975    1/1/2013    8890
     1/2/2013    0.3975  0.3225  0.3975   0.3225    1/2/2013    3400
     1/3/2013    0.3738  0.2800  0.3400   0.2900    1/3/2013   48933
     1/4/2013    0.4000  0.3175  0.3600   0.3175    1/4/2013    3610
VMGI 12/31/2012  0.0110  0.0090  0.0090   0.0090  12/31/2012  105726
     1/1/2013    0.0110  0.0090  0.0090   0.0090    1/1/2013  105726
     1/2/2013    0.0100  0.0080  0.0090   0.0100    1/2/2013  188150
     1/3/2013    0.0180  0.0110  0.0110   0.0150    1/3/2013  169890
     1/4/2013    0.0180  0.0140  0.0150   0.0180    1/4/2013   33500

严格地说，

reindex

的最后一位并不是必需的。但我担心我可能会重新安排约会。所以我把它们放回原位

让我猜猜，彭博社？这是如此干净和优雅。我会试试看。泰。是的，彭博社（非开发者/工程师之一），你可以从question@piRSquared是否有一种方法可以最终将股票作为多个指数的第一级，然后将完整的日期序列与每个单独的股票组合起来。所以我可以用某种方式从DF中“调用/索引”单个股票？@JWestwood我相信我一直在做的就是这样。查看我的更新，并让我知道这是否回答了问题。我需要一些时间来了解您所做的工作，但它似乎工作得很好。标记为正确的。泰