Python 多索引列上的索引
我有以下格式的csv文件:Python 多索引列上的索引,python,pandas,pivot,unpivot,melt,Python,Pandas,Pivot,Unpivot,Melt,我有以下格式的csv文件: | a | b | 2018 | 2018 | 2019 | 2019 | | | | jan | feb | jan | feb | --------------------------------------- | a1 | b1 | 0 | 1 | 2 | 3 | | a1 | b2 | 4 | 5 | 6 | 7 | | a2 | b1 | 8 | 9 | 10 | 11
| a | b | 2018 | 2018 | 2019 | 2019 |
| | | jan | feb | jan | feb |
---------------------------------------
| a1 | b1 | 0 | 1 | 2 | 3 |
| a1 | b2 | 4 | 5 | 6 | 7 |
| a2 | b1 | 8 | 9 | 10 | 11 |
| a2 | b2 | 12 | 13 | 14 | 15 |
| a | b | year | month | value |
----------------------------------
| a1 | b1 | 2018 | jan | 0 |
| a1 | b1 | 2018 | feb | 1 |
| a1 | b1 | 2019 | jan | 2 |
| a1 | b1 | 2019 | feb | 3 |
| a1 | b2 | 2018 | jan | 4 |
| a1 | b2 | 2018 | feb | 5 |
| a1 | b2 | 2019 | jan | 6 |
| a1 | b2 | 2019 | feb | 7 |
| a2 | b1 | 2018 | jan | 8 |
| a2 | b1 | 2018 | feb | 9 |
| a2 | b1 | 2019 | jan | 10 |
| a2 | b1 | 2019 | feb | 11 |
| a2 | b2 | 2018 | jan | 12 |
| a2 | b2 | 2018 | feb | 13 |
| a2 | b2 | 2019 | jan | 14 |
| a2 | b2 | 2019 | feb | 15 |
我想把它读成一个熊猫DF,然后把它融化成以下格式:
| a | b | 2018 | 2018 | 2019 | 2019 |
| | | jan | feb | jan | feb |
---------------------------------------
| a1 | b1 | 0 | 1 | 2 | 3 |
| a1 | b2 | 4 | 5 | 6 | 7 |
| a2 | b1 | 8 | 9 | 10 | 11 |
| a2 | b2 | 12 | 13 | 14 | 15 |
| a | b | year | month | value |
----------------------------------
| a1 | b1 | 2018 | jan | 0 |
| a1 | b1 | 2018 | feb | 1 |
| a1 | b1 | 2019 | jan | 2 |
| a1 | b1 | 2019 | feb | 3 |
| a1 | b2 | 2018 | jan | 4 |
| a1 | b2 | 2018 | feb | 5 |
| a1 | b2 | 2019 | jan | 6 |
| a1 | b2 | 2019 | feb | 7 |
| a2 | b1 | 2018 | jan | 8 |
| a2 | b1 | 2018 | feb | 9 |
| a2 | b1 | 2019 | jan | 10 |
| a2 | b1 | 2019 | feb | 11 |
| a2 | b2 | 2018 | jan | 12 |
| a2 | b2 | 2018 | feb | 13 |
| a2 | b2 | 2019 | jan | 14 |
| a2 | b2 | 2019 | feb | 15 |
如何实现这一点?对于普通数据帧,这应该可以:
import pandas as pd
df = pd.DataFrame({
'a': ['a1', 'a1', 'a2', 'a2',],
'b': ['b1', 'b2', 'b2', 'b2',],
'2018 jan': [0, 4, 8, 12],
'2018 feb': [1, 5, 9, 13],
'2019 jan': [2, 6, 10, 14],
'2019 feb': [3, 7, 11, 15],
})
df = df.melt(id_vars=['a', 'b'], var_name='date', value_name='value')
df['date'] = df['date'].str.split(' ')
df['year'] = df['date'].str[0]
df['month'] = df['date'].str[1]
df.drop(columns='date', inplace=True)
输出:
a b value year month
0 a1 b1 0 2018 jan
1 a1 b2 4 2018 jan
2 a2 b2 8 2018 jan
3 a2 b2 12 2018 jan
4 a1 b1 1 2018 feb
5 a1 b2 5 2018 feb
6 a2 b2 9 2018 feb
7 a2 b2 13 2018 feb
8 a1 b1 2 2019 jan
9 a1 b2 6 2019 jan
10 a2 b2 10 2019 jan
11 a2 b2 14 2019 jan
12 a1 b1 3 2019 feb
13 a1 b2 7 2019 feb
14 a2 b2 11 2019 feb
15 a2 b2 15 2019 feb
a b year month value
0 a1 b1 2018 jan 0
1 a1 b2 2018 jan 4
2 a2 b1 2018 jan 8
3 a2 b2 2018 jan 12
4 a1 b1 2018 feb 1
5 a1 b2 2018 feb 5
6 a2 b1 2018 feb 9
7 a2 b2 2018 feb 13
8 a1 b1 2019 jan 2
9 a1 b2 2019 jan 6
10 a2 b1 2019 jan 10
11 a2 b2 2019 jan 14
12 a1 b1 2019 feb 3
13 a1 b2 2019 feb 7
14 a2 b1 2019 feb 11
15 a2 b2 2019 feb 15
如果您在注释中提到的列中有一些多索引,则可以在此处将其转换为纯数据帧:
df = pd.read_csv('file.csv', header=[0,1])
df.columns = [' '.join(col).strip() for col in df.columns.values]
df.rename(columns={'a Unnamed: 0_level_1': 'a', 'b Unnamed: 1_level_1': 'b'}, inplace=True)
@KOB我的答案通常可以适合任何具有2行标题的csv文件,其中部分列仅位于第一行,部分列位于第一行和第二行。根据您的问题,此代码将按要求正确放置所有标题。 读取csv和创建的多索引数据帧时:
df_multiidx = pd.read_csv('two_levels_header_file.csv', header=[0,1])
id_vars = [idv for idv in df_multiidx.columns if 'Unnamed' in idv[1]]
value_vars = [valv for valv in df_multiidx.columns if 'Unnamed' not in valv[1]]
df_multiidx= df_multiidx.melt(id_vars=id_vars, value_vars=value_vars,var_name=['year','month'])
df_multiidx.rename(columns={col_ren:col_ren[0] for col_ren in id_vars})
输出:
a b value year month
0 a1 b1 0 2018 jan
1 a1 b2 4 2018 jan
2 a2 b2 8 2018 jan
3 a2 b2 12 2018 jan
4 a1 b1 1 2018 feb
5 a1 b2 5 2018 feb
6 a2 b2 9 2018 feb
7 a2 b2 13 2018 feb
8 a1 b1 2 2019 jan
9 a1 b2 6 2019 jan
10 a2 b2 10 2019 jan
11 a2 b2 14 2019 jan
12 a1 b1 3 2019 feb
13 a1 b2 7 2019 feb
14 a2 b2 11 2019 feb
15 a2 b2 15 2019 feb
a b year month value
0 a1 b1 2018 jan 0
1 a1 b2 2018 jan 4
2 a2 b1 2018 jan 8
3 a2 b2 2018 jan 12
4 a1 b1 2018 feb 1
5 a1 b2 2018 feb 5
6 a2 b1 2018 feb 9
7 a2 b2 2018 feb 13
8 a1 b1 2019 jan 2
9 a1 b2 2019 jan 6
10 a2 b1 2019 jan 10
11 a2 b2 2019 jan 14
12 a1 b1 2019 feb 3
13 a1 b2 2019 feb 7
14 a2 b1 2019 feb 11
15 a2 b2 2019 feb 15
我只是有一个csv文件,其中前两行与我给出的样本相同,即“2018”和“Jan”在不同的行中。我可以使用
read_csv(…,header=[0,1],…)
将这两行作为多索引头读入一个索引,但前两列也会被创建为多索引,如MultiIndex([('a','Unnamed:0_level_1'))
。