Python 将多索引中的值放入列表和多列中
使用Python 将多索引中的值放入列表和多列中,python,pandas,Python,Pandas,使用.extractall(),我有一个多索引: 如果你看一下索引12,它显示这个条目有两个匹配项,“Canvas”和“Calf Leather”。如何将此多索引转换为显示所有匹配属性的列?我想知道如何从两个方面做到这一点。以下是我想要的第一个结果: material 0 Leather 1 Leather 2 Leather
.extractall()
,我有一个多索引:
如果你看一下索引12,它显示这个条目有两个匹配项,“Canvas”和“Calf Leather”。如何将此多索引转换为显示所有匹配属性的列?我想知道如何从两个方面做到这一点。以下是我想要的第一个结果:
material
0 Leather
1 Leather
2 Leather
3 Leather
4 Nan
5 Leather
6 Leather
7 Canvas
8 Canvas
9 Canvas
10 Canvas
11 Leather
12 Canvas, Calf Leather
13 NaN
14 Nan
15 Leather
16 Leather
17 Leather
18 Leather
19 Nan
20 Leather
它获取多索引的每个级别的所有结果,并将它们转换为一个列表。您会注意到,我只关注原始多索引中的第0列,这是聚合.extractall
中所有结果的地方。下面是我想创建的第二个结果:
material material1
0 Leather NaN
1 Leather NaN
2 Leather NaN
3 Leather NaN
4 NaN NaN
5 Leather NaN
6 Leather NaN
7 Canvas NaN
8 Canvas NaN
9 Canvas NaN
10 Canvas NaN
11 Leather NaN
12 Canvas Calf Leather
13 NaN NaN
14 NaN NaN
15 Leather NaN
16 Leather NaN
17 Leather NaN
18 Leather NaN
19 NaN NaN
20 Leather NaN
对于第二个结果,在.extractall
多索引中,将有与最大匹配数一样多的附加列
我很高兴澄清任何不清楚的地方。谢谢大家! 我认为您可以使用allNaN
s来删除列,然后使用aggregatejoin
来删除groupby
:
df1 = df.dropna(axis=1, how='all').groupby(level=0).agg(lambda x: ', '.join(x.dropna()))
#replace to None empty spaces
df1 = df1.replace({'': None})
print (df1)
0 1 3
0 Leather None Leather
1 Leather None Leather
2 Leather None Leather
3 Leather None Leather
5 Leather None Leather
6 Leather None Leather
7 Canvas Canvas None
8 Canvas Canvas None
9 Canvas Canvas None
10 Canvas Canvas None
11 Leather None Leather
12 Canvas, Calf Leather Canvas None
15 Leather None Leather
16 Leather None Leather
17 Leather None Leather
18 Leather None Leather
20 Leather None Leather
对于相同的列长度,请使用,然后通过列表理解删除列中的多索引
:
df2 = df.dropna(axis=1, how='all').unstack()
df2.columns = ['mat{}_{}'.format(x[0], x[1]) for x in df2.columns]
print (df2)
mat0_0 mat0_1 mat1_0 mat1_1 mat3_0 mat3_1
0 Leather None NaN None Leather None
1 Leather None NaN None Leather None
2 Leather None NaN None Leather None
3 Leather None NaN None Leather None
5 Leather None NaN None Leather None
6 Leather None NaN None Leather None
7 Canvas None Canvas None NaN None
8 Canvas None Canvas None NaN None
9 Canvas None Canvas None NaN None
10 Canvas None Canvas None NaN None
11 Leather None NaN None Leather None
12 Canvas Calf Leather Canvas NaN NaN NaN
15 Leather None NaN None Leather None
16 Leather None NaN None Leather None
17 Leather None NaN None Leather None
18 Leather None NaN None Leather None
20 Leather None NaN None Leather None
我认为您可以使用allNaN
s删除列,然后使用aggregatejoin
删除groupby
:
df1 = df.dropna(axis=1, how='all').groupby(level=0).agg(lambda x: ', '.join(x.dropna()))
#replace to None empty spaces
df1 = df1.replace({'': None})
print (df1)
0 1 3
0 Leather None Leather
1 Leather None Leather
2 Leather None Leather
3 Leather None Leather
5 Leather None Leather
6 Leather None Leather
7 Canvas Canvas None
8 Canvas Canvas None
9 Canvas Canvas None
10 Canvas Canvas None
11 Leather None Leather
12 Canvas, Calf Leather Canvas None
15 Leather None Leather
16 Leather None Leather
17 Leather None Leather
18 Leather None Leather
20 Leather None Leather
对于相同的列长度,请使用,然后通过列表理解删除列中的多索引
:
df2 = df.dropna(axis=1, how='all').unstack()
df2.columns = ['mat{}_{}'.format(x[0], x[1]) for x in df2.columns]
print (df2)
mat0_0 mat0_1 mat1_0 mat1_1 mat3_0 mat3_1
0 Leather None NaN None Leather None
1 Leather None NaN None Leather None
2 Leather None NaN None Leather None
3 Leather None NaN None Leather None
5 Leather None NaN None Leather None
6 Leather None NaN None Leather None
7 Canvas None Canvas None NaN None
8 Canvas None Canvas None NaN None
9 Canvas None Canvas None NaN None
10 Canvas None Canvas None NaN None
11 Leather None NaN None Leather None
12 Canvas Calf Leather Canvas NaN NaN NaN
15 Leather None NaN None Leather None
16 Leather None NaN None Leather None
17 Leather None NaN None Leather None
18 Leather None NaN None Leather None
20 Leather None NaN None Leather None
看起来真不错!唯一缺少的是填充没有结果的行。因此,在上面的两个解决方案中,第4、13、19行以及所有其他类似的行应该分别在第0列和第1列中使用“None”作为它们的值。我认为您需要按范围添加reindex,例如df1=df.dropna(axis=1,how='all')。groupby(level=0)。agg(lambda x:','.'join(x.dropna())。reindex(range(len(df.index))
看起来非常好!唯一缺少的是填充没有结果的行。因此,在上面的两个解决方案中,第4、13、19行以及所有其他类似的行应该分别在第0列和第1列中使用“None”作为它们的值。我认为需要按范围添加reindex,例如df1=df.dropna(axis=1,how='all').groupby(level=0.agg)(lambda x:','.join(x.dropna()).reindex(range(len(df.index))