python中不同列中列内字典的拆分列表_Python_Python 3.x_Pandas_Dictionary

python中不同列中列内字典的拆分列表

python python-3.x pandas dictionary

python中不同列中列内字典的拆分列表,python,python-3.x,pandas,dictionary,Python,Python 3.x,Pandas,Dictionary,我有一个这样的数据帧 data = {'col_1': [1, 2], 'col_2': [[{'KEY': 'A', 'VALUE': 'a'}], [{'KEY': 'B', 'VALUE': 'b'}]], 'col_3': [[{'KEY': 'C', 'VALUE': 'c'}], [{'KEY': 'A', 'VALUE': 'a'}]]} pd.DataFrame.from_dict(data) col_1 col_2

我有一个这样的数据帧

data = {'col_1': [1, 2],
        'col_2': [[{'KEY': 'A', 'VALUE': 'a'}], [{'KEY': 'B', 'VALUE': 'b'}]],
        'col_3': [[{'KEY': 'C', 'VALUE': 'c'}], [{'KEY': 'A', 'VALUE': 'a'}]]}
pd.DataFrame.from_dict(data)

    col_1   col_2                           col_3
0   1       [{'KEY': 'A', 'VALUE': 'a'}]    [{'KEY': 'C', 'VALUE': 'c'}]
1   2       [{'KEY': 'B', 'VALUE': 'b'}]    [{'KEY': 'A', 'VALUE': 'a'}]

我想转换每列中的字典列表，以便获得以下输出

    col_1   col_2_KEY   col_2_VALUE     col_3_KEY   col_3_VALUE
0   1       A           a               C           c
1   2       B           b               A           a

编辑1：

可能存在列值为null的情况

data = {'col_1': [1, 2],
        'col_2': [[{'KEY': 'A', 'VALUE': 'a'}], [{'KEY': 'B', 'VALUE': 'b'}]],
        'col_3': [[{'KEY': 'C', 'VALUE': 'c'}], [{'KEY': 'A', 'VALUE': 'a'}]]}
pd.DataFrame.from_dict(data)

    col_1   col_2                           col_3
0   1       [{'KEY': 'A', 'VALUE': 'a'}]    []
1   2       [{'KEY': 'B', 'VALUE': 'b'}]    [{'KEY': 'A', 'VALUE': 'a'}]

预期产量

    col_1   col_2_KEY   col_2_VALUE     col_3_KEY   col_3_VALUE
0   1       A           a               <blank>     <blank> 
1   2       B           b               A           a

列1列2键列2值列3键列3值
0 1 A
1 2 B A A

使用

列表理解

获取字典值和COL：

cols = ['col_2','col_3']
for col in cols:
    df[col+'_KEY'] = [d[0].get('KEY') for d in df[col]]
    df[col+'_VALUE'] = [d[0].get('VALUE') for d in df[col]]

df.drop(cols, axis=1, inplace=True)

print(df)
   col_1 col_2_KEY col_2_VALUE col_3_KEY col_3_VALUE
0      1         A           a         C           c
1      2         B           b         A           a

更新：

cols = ['col_2','col_3']
for col in cols:
    df[col+'_KEY'] = [d[0].get('KEY') if d else '' for d in df[col] ]
    df[col+'_VALUE'] = [d[0].get('VALUE') if d else '' for d in df[col]]

df.drop(cols, axis=1, inplace=True)

print(df)
   col_1 col_2_KEY col_2_VALUE col_3_KEY col_3_VALUE
0      1         A           a                      
1      2         B           b         A           a

你可以用

def splitter(item):
    try:
        d = item[0]
        return (d["KEY"], d["VALUE"])
    except IndexError:
        return (None, None)


for i in [2, 3]:
    df["col_{}_KEY".format(i)], df["col_{}_VALUE".format(i)] = df["col_{}".format(i)].apply(splitter)
    df.drop("col_{}".format(i), axis=1, inplace=True)

屈服

   col_1 col_2_KEY col_2_VALUE col_3_KEY col_3_VALUE
0      1         A           B         C           A
1      2         a           b         c           a

您可以尝试：

df = pd.concat([df.drop(['col_2','col_3'], axis=1)
                , df['col_2'].apply(lambda x:pd.Series(x[0] if len(x)>0 else {})).rename(columns={'KEY':'col_2_KEY','VALUE':'col_2_VALUE'})
                , df['col_3'].apply(lambda x:pd.Series(x[0] if len(x)>0 else {})).rename(columns={'KEY':'col_3_KEY','VALUE':'col_3_VALUE'})
                ], axis=1)
print(df)

   col_1 col_2_KEY col_2_VALUE col_3_KEY col_3_VALUE
0      1         A           a         C           c
1      2         B           b         A           a

如果我有一列的列表为空，解决方案就会失败，对吗？检查我的edit@Hardikgupta我把它添加到了答案中。如果我有一个空列表的列，解决方案就会失败，对吗？检查我的edit@Hardikgupta：是的，会的。但是我在

splitter

函数中添加了一个

try/except

块，这应该可以解释这一点。如果我有一个空列表的列，解决方案就会失败，对吗？检查我的编辑