Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 处理多索引的策略_Python_Pandas - Fatal编程技术网

Python 处理多索引的策略

Python 处理多索引的策略,python,pandas,Python,Pandas,一个库递给我一个带有多索引的熊猫数据帧。 结果如下: xf.index DatetimeIndex(['2011-03-31', '2011-04-01', '2011-04-04', '2011-04-05', '2011-04-06', '2011-04-07', '2011-04-08', '2011-04-11', '2011-04-12', '2011-04-13', ...

一个库递给我一个带有多索引的熊猫数据帧。 结果如下:

xf.index
DatetimeIndex(['2011-03-31', '2011-04-01', '2011-04-04', '2011-04-05',
               '2011-04-06', '2011-04-07', '2011-04-08', '2011-04-11',
               '2011-04-12', '2011-04-13',
               ...
               '2017-10-19', '2017-10-20', '2017-10-23', '2017-10-24',
               '2017-10-25', '2017-10-26', '2017-10-27', '2017-10-30',
               '2017-10-31', '2017-11-01'],
              dtype='datetime64[ns]', name=u'date', length=1702, freq=None)

xf.columns

MultiIndex(levels=[[u'jan', u'feb', u'mar'], [u'PRICE', u'AMOUNT', u'NAME', u'STYLE']],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]])
基本思想是,对于一月、二月、三月,每天都会评估一些信息字段(价格、金额、名称、样式)

我真的不善于操纵这个多重索引

我需要做的事情有:

  • 修改现有的第二级列。把所有的“名字”都改成小写

  • 添加新列,例如“修改的_名称”。这将是
    适用于所有一月、二月和三月

我不知道我是否应该尝试将整个列索引扁平化为一个级别(这样就有一列'month'的值为'jan'、'feb'、'mar',然后是其他现有的第2级列(价格、金额、名称、样式)。我不需要多重索引

我将如何将数据帧折叠成那样


或者有没有办法修改和添加层次索引下的列?

我认为最简单的方法是通过以下方式创建经典列:重塑-get
MultiIndex
作为索引:

然后修改列:

df.columns = df.columns.str.lower()
df['new_col'] = 1
最后一次整形是通过

样本:

i = pd.DatetimeIndex(['2011-03-31', '2011-04-01', '2011-04-04', '2011-04-05',
               '2011-04-06', '2011-04-07', '2011-04-08', '2011-04-11',
               '2011-04-12', '2011-04-13'])
cols = pd.MultiIndex.from_product([[u'jan', u'feb'],[u'PRICE', u'AMOUNT', u'NAME']])
df = pd.DataFrame(np.random.randint(10, size=(len(i), 6)),index=i, columns=cols)

print (df)
             jan               feb            
           PRICE AMOUNT NAME PRICE AMOUNT NAME
2011-03-31     2      7    3     6      0    5
2011-04-01     6      2    5     0      4    2
2011-04-04     9      0    7     2      7    9
2011-04-05     5      3    5     7      9    9
2011-04-06     1      4    4     1      6    3
2011-04-07     1      7    4     9      6    7
2011-04-08     6      1    7     4      4    2
2011-04-11     7      5    6     8      0    3
2011-04-12     3      3    9     2      4    0
2011-04-13     0      0    1     9      0    3


另一种解决方案是创建新的
多索引
,并将新列的
数据帧
改为原始列:

a = df.columns.get_level_values(0)
b = df.columns.get_level_values(1).str.lower()
df.columns = pd.MultiIndex.from_arrays([a,b])

mux = pd.MultiIndex.from_product([a.unique(),['new']])
df1 = pd.DataFrame(1, columns=mux, index=df.index)
print (df1)
           jan feb
           new new
2011-03-31   1   1
2011-04-01   1   1
2011-04-04   1   1
2011-04-05   1   1
2011-04-06   1   1
2011-04-07   1   1
2011-04-08   1   1
2011-04-11   1   1
2011-04-12   1   1
2011-04-13   1   1

df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
              feb                   jan               
           amount name new price amount name new price
2011-03-31      0    5   1     6      7    3   1     2
2011-04-01      4    2   1     0      2    5   1     6
2011-04-04      7    9   1     2      0    7   1     9
2011-04-05      9    9   1     7      3    5   1     5
2011-04-06      6    3   1     1      4    4   1     1
2011-04-07      6    7   1     9      7    4   1     1
2011-04-08      4    2   1     4      1    7   1     6
2011-04-11      0    3   1     8      5    6   1     7
2011-04-12      4    0   1     2      3    9   1     3
2011-04-13      0    3   1     9      0    1   1     0
df = df.stack(0)
print (df)
                AMOUNT  NAME  PRICE
2011-03-31 feb       0     5      6
           jan       7     3      2
2011-04-01 feb       4     2      0
           jan       2     5      6
2011-04-04 feb       7     9      2
           jan       0     7      9
2011-04-05 feb       9     9      7
           jan       3     5      5
2011-04-06 feb       6     3      1
           jan       4     4      1
2011-04-07 feb       6     7      9
           jan       7     4      1
2011-04-08 feb       4     2      4
           jan       1     7      6
2011-04-11 feb       0     3      8
           jan       5     6      7
2011-04-12 feb       4     0      2
           jan       3     9      3
2011-04-13 feb       0     3      9
           jan       0     1      0
df.columns = df.columns.str.lower()
df['new'] = 1

df = df.unstack().swaplevel(0,1,1).sort_index(axis=1)
print (df)
              feb                   jan               
           amount name new price amount name new price
2011-03-31      0    5   1     6      7    3   1     2
2011-04-01      4    2   1     0      2    5   1     6
2011-04-04      7    9   1     2      0    7   1     9
2011-04-05      9    9   1     7      3    5   1     5
2011-04-06      6    3   1     1      4    4   1     1
2011-04-07      6    7   1     9      7    4   1     1
2011-04-08      4    2   1     4      1    7   1     6
2011-04-11      0    3   1     8      5    6   1     7
2011-04-12      4    0   1     2      3    9   1     3
2011-04-13      0    3   1     9      0    1   1     0
a = df.columns.get_level_values(0)
b = df.columns.get_level_values(1).str.lower()
df.columns = pd.MultiIndex.from_arrays([a,b])

mux = pd.MultiIndex.from_product([a.unique(),['new']])
df1 = pd.DataFrame(1, columns=mux, index=df.index)
print (df1)
           jan feb
           new new
2011-03-31   1   1
2011-04-01   1   1
2011-04-04   1   1
2011-04-05   1   1
2011-04-06   1   1
2011-04-07   1   1
2011-04-08   1   1
2011-04-11   1   1
2011-04-12   1   1
2011-04-13   1   1

df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
              feb                   jan               
           amount name new price amount name new price
2011-03-31      0    5   1     6      7    3   1     2
2011-04-01      4    2   1     0      2    5   1     6
2011-04-04      7    9   1     2      0    7   1     9
2011-04-05      9    9   1     7      3    5   1     5
2011-04-06      6    3   1     1      4    4   1     1
2011-04-07      6    7   1     9      7    4   1     1
2011-04-08      4    2   1     4      1    7   1     6
2011-04-11      0    3   1     8      5    6   1     7
2011-04-12      4    0   1     2      3    9   1     3
2011-04-13      0    3   1     9      0    1   1     0