使用agg展平pandas中的一系列列表
我有许多多索引列,每个列都有一个元组列表,我想将其展平(列表,而不是元组),但我正在努力解决这个问题。以下是我所拥有的:使用agg展平pandas中的一系列列表,pandas,Pandas,我有许多多索引列,每个列都有一个元组列表,我想将其展平(列表,而不是元组),但我正在努力解决这个问题。以下是我所拥有的: df = pd.DataFrame([[[(1,'a')],[(6,'b')],np.nan,np.nan],[[(5,'d'),(10,'e')],np.nan,np.nan,[(8,'c')]]]) df.columns = pd.MultiIndex.from_tuples([('a', 0), ('a', 1), ('b', 0), ('b', 1)]) >&
df = pd.DataFrame([[[(1,'a')],[(6,'b')],np.nan,np.nan],[[(5,'d'),(10,'e')],np.nan,np.nan,[(8,'c')]]])
df.columns = pd.MultiIndex.from_tuples([('a', 0), ('a', 1), ('b', 0), ('b', 1)])
>>> df
a b
0 1 0 1
0 [(1, a)] [(6, b)] NaN NaN
1 [(5, d), (10, e)] NaN NaN [(8, c)]
所需结果:
>>> df
a b
0 [(1, a), (6, b)] [NaN, NaN]
1 [(5, d), (10, e), NaN] [NaN, (8, c)]
我该怎么做?由此,我尝试了以下方法:
>>> df.stack(level=1).groupby(level=[0]).agg(lambda x: np.array(list(x)).flatten())
a b
0 a b
1 a b
>>> df.stack(level=1).groupby(level=[0]).agg(lambda x: np.concatenate(list(x)))
...
Exception: Must produce aggregated value
以下是一种方法:
# taken from https://stackoverflow.com/questions/12472338/flattening-a-list-recursively
def flatten(S):
if S == []:
return S
if isinstance(S[0], list):
return flatten(S[0]) + flatten(S[1:])
return S[:1] + flatten(S[1:])
# reshape the data for get the desired structure
df2 = (df
.unstack()
.reset_index()
.drop('level_1', 1)
.groupby(['level_0', 'level_2'])[0]
.apply(list).apply(flatten).unstack().T)
df2.index.name = None
df2.columns.name = None
print(df2)
a b
0 [(1, a), (6, b)] [na, na]
1 [(5, d), (10, e), na] [na, (8, c)]
以下是一种方法:
# taken from https://stackoverflow.com/questions/12472338/flattening-a-list-recursively
def flatten(S):
if S == []:
return S
if isinstance(S[0], list):
return flatten(S[0]) + flatten(S[1:])
return S[:1] + flatten(S[1:])
# reshape the data for get the desired structure
df2 = (df
.unstack()
.reset_index()
.drop('level_1', 1)
.groupby(['level_0', 'level_2'])[0]
.apply(list).apply(flatten).unstack().T)
df2.index.name = None
df2.columns.name = None
print(df2)
a b
0 [(1, a), (6, b)] [na, na]
1 [(5, d), (10, e), na] [na, (8, c)]
找到了一条单行线:
使用@YOLO提供的flant
自定义函数
>>> df.stack(level=1).groupby(level=0).agg(list).applymap(flatten)
a b
0 [(1, a), (6, b)] [nan, nan]
1 [(5, d), (10, e), nan] [nan, (8, c)]
在哪里
找到了一条单行线:
使用@YOLO提供的flant
自定义函数
>>> df.stack(level=1).groupby(level=0).agg(list).applymap(flatten)
a b
0 [(1, a), (6, b)] [nan, nan]
1 [(5, d), (10, e), nan] [nan, (8, c)]
在哪里
有几个步骤有点太长,但逻辑是可行的。我在上面做了一个简短的回答+1、谢谢!有几个步骤有点太长,但逻辑是可行的。我在上面做了一个简短的回答+1、谢谢!