Python 将多索引中的值放入列表和多列中

Python 将多索引中的值放入列表和多列中,python,pandas,Python,Pandas,使用.extractall(),我有一个多索引: 如果你看一下索引12,它显示这个条目有两个匹配项,“Canvas”和“Calf Leather”。如何将此多索引转换为显示所有匹配属性的列?我想知道如何从两个方面做到这一点。以下是我想要的第一个结果: material 0 Leather 1 Leather 2 Leather

使用
.extractall()
,我有一个多索引:

如果你看一下索引12,它显示这个条目有两个匹配项,“Canvas”和“Calf Leather”。如何将此多索引转换为显示所有匹配属性的列?我想知道如何从两个方面做到这一点。以下是我想要的第一个结果:

                  material      

0                Leather     
1                Leather     
2                Leather     
3                Leather
4                  Nan      
5                Leather     
6                Leather     
7                 Canvas  
8                 Canvas  
9                 Canvas  
10                Canvas  
11               Leather  
12   Canvas, Calf Leather  
13                 NaN
14                 Nan 
15               Leather  
16               Leather  
17               Leather  
18               Leather
19                 Nan   
20               Leather  
它获取多索引的每个级别的所有结果,并将它们转换为一个列表。您会注意到,我只关注原始多索引中的第0列,这是聚合
.extractall
中所有结果的地方。下面是我想创建的第二个结果:

                 material     material1 

0                Leather     NaN
1                Leather     NaN
2                Leather     NaN
3                Leather     NaN
4                   NaN      NaN
5                Leather     NaN
6                Leather     NaN
7                 Canvas     NaN
8                 Canvas     NaN
9                 Canvas     NaN
10                Canvas     NaN
11               Leather     NaN
12                Canvas  Calf Leather
13                   NaN     NaN
14                   NaN     NaN
15               Leather     NaN
16               Leather     NaN
17               Leather     NaN
18               Leather     NaN
19                   NaN     NaN
20               Leather     NaN
对于第二个结果,在
.extractall
多索引中,将有与最大匹配数一样多的附加列

我很高兴澄清任何不清楚的地方。谢谢大家!

我认为您可以使用all
NaN
s来删除列,然后使用aggregate
join
来删除
groupby

df1 = df.dropna(axis=1, how='all').groupby(level=0).agg(lambda x: ', '.join(x.dropna()))
#replace to None empty spaces
df1 = df1.replace({'': None})
print (df1)
                       0       1        3

0                Leather    None  Leather
1                Leather    None  Leather
2                Leather    None  Leather
3                Leather    None  Leather
5                Leather    None  Leather
6                Leather    None  Leather
7                 Canvas  Canvas     None
8                 Canvas  Canvas     None
9                 Canvas  Canvas     None
10                Canvas  Canvas     None
11               Leather    None  Leather
12  Canvas, Calf Leather  Canvas     None
15               Leather    None  Leather
16               Leather    None  Leather
17               Leather    None  Leather
18               Leather    None  Leather
20               Leather    None  Leather
对于相同的列长度,请使用,然后通过
列表理解删除列中的
多索引

df2 = df.dropna(axis=1, how='all').unstack()
df2.columns = ['mat{}_{}'.format(x[0], x[1]) for x in df2.columns]
print (df2)
     mat0_0        mat0_1  mat1_0 mat1_1   mat3_0 mat3_1

0   Leather          None     NaN   None  Leather   None
1   Leather          None     NaN   None  Leather   None
2   Leather          None     NaN   None  Leather   None
3   Leather          None     NaN   None  Leather   None
5   Leather          None     NaN   None  Leather   None
6   Leather          None     NaN   None  Leather   None
7    Canvas          None  Canvas   None      NaN   None
8    Canvas          None  Canvas   None      NaN   None
9    Canvas          None  Canvas   None      NaN   None
10   Canvas          None  Canvas   None      NaN   None
11  Leather          None     NaN   None  Leather   None
12   Canvas  Calf Leather  Canvas    NaN      NaN    NaN
15  Leather          None     NaN   None  Leather   None
16  Leather          None     NaN   None  Leather   None
17  Leather          None     NaN   None  Leather   None
18  Leather          None     NaN   None  Leather   None
20  Leather          None     NaN   None  Leather   None
我认为您可以使用all
NaN
s删除列,然后使用aggregate
join
删除
groupby

df1 = df.dropna(axis=1, how='all').groupby(level=0).agg(lambda x: ', '.join(x.dropna()))
#replace to None empty spaces
df1 = df1.replace({'': None})
print (df1)
                       0       1        3

0                Leather    None  Leather
1                Leather    None  Leather
2                Leather    None  Leather
3                Leather    None  Leather
5                Leather    None  Leather
6                Leather    None  Leather
7                 Canvas  Canvas     None
8                 Canvas  Canvas     None
9                 Canvas  Canvas     None
10                Canvas  Canvas     None
11               Leather    None  Leather
12  Canvas, Calf Leather  Canvas     None
15               Leather    None  Leather
16               Leather    None  Leather
17               Leather    None  Leather
18               Leather    None  Leather
20               Leather    None  Leather
对于相同的列长度,请使用,然后通过
列表理解删除列中的
多索引

df2 = df.dropna(axis=1, how='all').unstack()
df2.columns = ['mat{}_{}'.format(x[0], x[1]) for x in df2.columns]
print (df2)
     mat0_0        mat0_1  mat1_0 mat1_1   mat3_0 mat3_1

0   Leather          None     NaN   None  Leather   None
1   Leather          None     NaN   None  Leather   None
2   Leather          None     NaN   None  Leather   None
3   Leather          None     NaN   None  Leather   None
5   Leather          None     NaN   None  Leather   None
6   Leather          None     NaN   None  Leather   None
7    Canvas          None  Canvas   None      NaN   None
8    Canvas          None  Canvas   None      NaN   None
9    Canvas          None  Canvas   None      NaN   None
10   Canvas          None  Canvas   None      NaN   None
11  Leather          None     NaN   None  Leather   None
12   Canvas  Calf Leather  Canvas    NaN      NaN    NaN
15  Leather          None     NaN   None  Leather   None
16  Leather          None     NaN   None  Leather   None
17  Leather          None     NaN   None  Leather   None
18  Leather          None     NaN   None  Leather   None
20  Leather          None     NaN   None  Leather   None

看起来真不错!唯一缺少的是填充没有结果的行。因此,在上面的两个解决方案中,第4、13、19行以及所有其他类似的行应该分别在第0列和第1列中使用“None”作为它们的值。我认为您需要按范围添加reindex,例如
df1=df.dropna(axis=1,how='all')。groupby(level=0)。agg(lambda x:','.'join(x.dropna())。reindex(range(len(df.index))
看起来非常好!唯一缺少的是填充没有结果的行。因此,在上面的两个解决方案中,第4、13、19行以及所有其他类似的行应该分别在第0列和第1列中使用“None”作为它们的值。我认为需要按范围添加reindex,例如
df1=df.dropna(axis=1,how='all').groupby(level=0.agg)(lambda x:','.join(x.dropna()).reindex(range(len(df.index))