Python 熊猫将包含字典列表的单元格展开为行,每个行的键为列
我有这样一个数据帧:Python 熊猫将包含字典列表的单元格展开为行,每个行的键为列,python,pandas,dataframe,Python,Pandas,Dataframe,我有这样一个数据帧: col1 col2 col3 0 "[{'key1':'val1'}, {'key1':'val2'}]" a g 1 "[{'key1':'val3'}, {'key1':'val4'}]" b h 2 "[{'key1':'val5'}, {'ke
col1 col2 col3
0 "[{'key1':'val1'}, {'key1':'val2'}]" a g
1 "[{'key1':'val3'}, {'key1':'val4'}]" b h
2 "[{'key1':'val5'}, {'key1':'val6'}]" c i
col2 col3 key1
0 a g val1
1 a g val2
2 b h val3
3 b h val4
4 c i val5
5 c i val6
我想把它处理成这样:
col1 col2 col3
0 "[{'key1':'val1'}, {'key1':'val2'}]" a g
1 "[{'key1':'val3'}, {'key1':'val4'}]" b h
2 "[{'key1':'val5'}, {'key1':'val6'}]" c i
col2 col3 key1
0 a g val1
1 a g val2
2 b h val3
3 b h val4
4 c i val5
5 c i val6
这是稍微简化的。col1中的字典有更多的列,还有两个以上的列
我在其他帖子中也看到过类似的解决方案,但所有的帖子都假设col1是一个常规列表。我对熊猫还不太熟悉,不知道该如何找到适合我情况的解决方案。感谢您的帮助。谢谢
更新:我找到了解决方案
首先,我将字符串转换为字典列表:
df['col1'] = df['col1'].apply(json.loads)
然后我将其分解,使每个字典都有自己的行:
res = df.explode('col1')
然后,我为字典中的每个键创建一列:
res[['key1','key2','key3']] = res['col1'].apply(lambda x: self._explode_dict(x))
这是我的_explode_dict(行)函数。这样做的目的是避免空字典进入pd.Series的错误
if (isinstance(row, dict) and bool(row)):
return pd.Series(row)
return pd.Series({
'key1': '',
'key2': '',
'key3': '',
})
df=df.explode('col1')。reset_index(drop=True)
,然后df.col1=df.col1.str.get('key1')
df=df.explode('col1')。reset_index(drop=True),然后df.col1=df.col1.str.get('key1')