Python 处理多索引的策略
一个库递给我一个带有多索引的熊猫数据帧。 结果如下:Python 处理多索引的策略,python,pandas,Python,Pandas,一个库递给我一个带有多索引的熊猫数据帧。 结果如下: xf.index DatetimeIndex(['2011-03-31', '2011-04-01', '2011-04-04', '2011-04-05', '2011-04-06', '2011-04-07', '2011-04-08', '2011-04-11', '2011-04-12', '2011-04-13', ...
xf.index
DatetimeIndex(['2011-03-31', '2011-04-01', '2011-04-04', '2011-04-05',
'2011-04-06', '2011-04-07', '2011-04-08', '2011-04-11',
'2011-04-12', '2011-04-13',
...
'2017-10-19', '2017-10-20', '2017-10-23', '2017-10-24',
'2017-10-25', '2017-10-26', '2017-10-27', '2017-10-30',
'2017-10-31', '2017-11-01'],
dtype='datetime64[ns]', name=u'date', length=1702, freq=None)
xf.columns
MultiIndex(levels=[[u'jan', u'feb', u'mar'], [u'PRICE', u'AMOUNT', u'NAME', u'STYLE']],
labels=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]])
基本思想是,对于一月、二月、三月,每天都会评估一些信息字段(价格、金额、名称、样式)
我真的不善于操纵这个多重索引
我需要做的事情有:
- 修改现有的第二级列。把所有的“名字”都改成小写
- 添加新列,例如“修改的_名称”。这将是
适用于所有一月、二月和三月
或者有没有办法修改和添加层次索引下的列?我认为最简单的方法是通过以下方式创建经典列:重塑-get
MultiIndex
作为索引:
然后修改列:
df.columns = df.columns.str.lower()
df['new_col'] = 1
最后一次整形是通过
样本:
i = pd.DatetimeIndex(['2011-03-31', '2011-04-01', '2011-04-04', '2011-04-05',
'2011-04-06', '2011-04-07', '2011-04-08', '2011-04-11',
'2011-04-12', '2011-04-13'])
cols = pd.MultiIndex.from_product([[u'jan', u'feb'],[u'PRICE', u'AMOUNT', u'NAME']])
df = pd.DataFrame(np.random.randint(10, size=(len(i), 6)),index=i, columns=cols)
print (df)
jan feb
PRICE AMOUNT NAME PRICE AMOUNT NAME
2011-03-31 2 7 3 6 0 5
2011-04-01 6 2 5 0 4 2
2011-04-04 9 0 7 2 7 9
2011-04-05 5 3 5 7 9 9
2011-04-06 1 4 4 1 6 3
2011-04-07 1 7 4 9 6 7
2011-04-08 6 1 7 4 4 2
2011-04-11 7 5 6 8 0 3
2011-04-12 3 3 9 2 4 0
2011-04-13 0 0 1 9 0 3
另一种解决方案是创建新的
多索引
,并将新列的数据帧
改为原始列:
a = df.columns.get_level_values(0)
b = df.columns.get_level_values(1).str.lower()
df.columns = pd.MultiIndex.from_arrays([a,b])
mux = pd.MultiIndex.from_product([a.unique(),['new']])
df1 = pd.DataFrame(1, columns=mux, index=df.index)
print (df1)
jan feb
new new
2011-03-31 1 1
2011-04-01 1 1
2011-04-04 1 1
2011-04-05 1 1
2011-04-06 1 1
2011-04-07 1 1
2011-04-08 1 1
2011-04-11 1 1
2011-04-12 1 1
2011-04-13 1 1
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
feb jan
amount name new price amount name new price
2011-03-31 0 5 1 6 7 3 1 2
2011-04-01 4 2 1 0 2 5 1 6
2011-04-04 7 9 1 2 0 7 1 9
2011-04-05 9 9 1 7 3 5 1 5
2011-04-06 6 3 1 1 4 4 1 1
2011-04-07 6 7 1 9 7 4 1 1
2011-04-08 4 2 1 4 1 7 1 6
2011-04-11 0 3 1 8 5 6 1 7
2011-04-12 4 0 1 2 3 9 1 3
2011-04-13 0 3 1 9 0 1 1 0
df = df.stack(0)
print (df)
AMOUNT NAME PRICE
2011-03-31 feb 0 5 6
jan 7 3 2
2011-04-01 feb 4 2 0
jan 2 5 6
2011-04-04 feb 7 9 2
jan 0 7 9
2011-04-05 feb 9 9 7
jan 3 5 5
2011-04-06 feb 6 3 1
jan 4 4 1
2011-04-07 feb 6 7 9
jan 7 4 1
2011-04-08 feb 4 2 4
jan 1 7 6
2011-04-11 feb 0 3 8
jan 5 6 7
2011-04-12 feb 4 0 2
jan 3 9 3
2011-04-13 feb 0 3 9
jan 0 1 0
df.columns = df.columns.str.lower()
df['new'] = 1
df = df.unstack().swaplevel(0,1,1).sort_index(axis=1)
print (df)
feb jan
amount name new price amount name new price
2011-03-31 0 5 1 6 7 3 1 2
2011-04-01 4 2 1 0 2 5 1 6
2011-04-04 7 9 1 2 0 7 1 9
2011-04-05 9 9 1 7 3 5 1 5
2011-04-06 6 3 1 1 4 4 1 1
2011-04-07 6 7 1 9 7 4 1 1
2011-04-08 4 2 1 4 1 7 1 6
2011-04-11 0 3 1 8 5 6 1 7
2011-04-12 4 0 1 2 3 9 1 3
2011-04-13 0 3 1 9 0 1 1 0
a = df.columns.get_level_values(0)
b = df.columns.get_level_values(1).str.lower()
df.columns = pd.MultiIndex.from_arrays([a,b])
mux = pd.MultiIndex.from_product([a.unique(),['new']])
df1 = pd.DataFrame(1, columns=mux, index=df.index)
print (df1)
jan feb
new new
2011-03-31 1 1
2011-04-01 1 1
2011-04-04 1 1
2011-04-05 1 1
2011-04-06 1 1
2011-04-07 1 1
2011-04-08 1 1
2011-04-11 1 1
2011-04-12 1 1
2011-04-13 1 1
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
feb jan
amount name new price amount name new price
2011-03-31 0 5 1 6 7 3 1 2
2011-04-01 4 2 1 0 2 5 1 6
2011-04-04 7 9 1 2 0 7 1 9
2011-04-05 9 9 1 7 3 5 1 5
2011-04-06 6 3 1 1 4 4 1 1
2011-04-07 6 7 1 9 7 4 1 1
2011-04-08 4 2 1 4 1 7 1 6
2011-04-11 0 3 1 8 5 6 1 7
2011-04-12 4 0 1 2 3 9 1 3
2011-04-13 0 3 1 9 0 1 1 0