Python 根据原始索引提取字符串并插入为多行

Python 根据原始索引提取字符串并插入为多行,python,json,pandas,indexing,insert,Python,Json,Pandas,Indexing,Insert,到目前为止,我已经将示例数据集(df)、预期输出(df2)和我的代码放在下面。 我有一个df,其中i2列中的一些行包含一个json格式的列表,需要从提取它们的行分解并重新插入df。但需要输入到不同的列(i1)。我需要从字符串中提取唯一标识符(“id_2”值),并将其插入id_2列 到目前为止,在我的代码中,我使用pd.normalize解析类似json的数据,然后将列i1中的原始字符串插入到提取字符串的顶部(如果您在下面查看,应该会更清楚),然后根据索引重新插入它们。但是我必须指定索引,这不好。

到目前为止,我已经将示例数据集(df)、预期输出(df2)和我的代码放在下面。 我有一个df,其中i2列中的一些行包含一个json格式的列表,需要从提取它们的行分解并重新插入df。但需要输入到不同的列(i1)。我需要从字符串中提取唯一标识符(“id_2”值),并将其插入id_2列

到目前为止,在我的代码中,我使用pd.normalize解析类似json的数据,然后将列i1中的原始字符串插入到提取字符串的顶部(如果您在下面查看,应该会更清楚),然后根据索引重新插入它们。但是我必须指定索引,这不好。我希望它不太依赖于手动输入索引,以防将来随着这些嵌套单元格的增加而发生变化,或者索引以某种方式发生变化

欢迎提出任何建议,非常感谢

示例数据

import pandas as pd

df = pd.DataFrame(data={'id': [1, 2, 3, 4, 5], 'id_2': ['a','b','c','d','e'], 'i1': ['How old are you?','Over the last month have you felt','Do you live alone?','In the last week have you had','When did you last visit a doctor?'], 'i2': [0,0,0,0,0]})
df['i2'] = df['i2'].astype('object')

a = [{'id': 'b1', 'item': 'happy?', 'id_2': 'hj59'}, {'id': 'b2', 'item': 'sad?', 'id_2': 'dgb'}, {'id': 'b3', 'item': 'angry?', 'id_2':'kj9'}, {'id': 'b4', 'item': 'frustrated?','id2':'lp7'}]
b = [{'id': 'c1', 'item': 'trouble sleeping?'}, {'id': 'c2', 'item': 'changes in appetite?'}, {'id': 'c3', 'item': 'mood swings?'}, {'id': 'c4', 'item': 'trouble relaxing?'}]

df.at[1, 'i2'] = a 
df.at[3, 'i2'] = b 

预期产出

df2 = pd.DataFrame(data={'id': [1,2,2,2,2,3,4,4,4,4,5], 
                         'id_2': ['a','hj59','dgb','kj9','lp7','c','d','d','d','d','e'],
                         'i1': ['How old are you?',
                                'Over the last month have you felt happy?',
                                'Over the last month have you felt sad?',
                                'Over the last month have you felt angry?',
                                'Over the last month have you felt frustrated?',
                                'Do you live alone?',
                                'In the last week have you had trouble sleeping?',
                                'In the last week have you had changes in appetite?',
                                'In the last week have you had mood swings?',
                                'In the last week have you had trouble relaxing?',
                                'When did you last visit a doctor?'], 
                         'i2': [0,1,1,1,1,0,1,1,1,1,0]})

到目前为止我的丑陋代码

s={}
s = df[df.i2 != 0]

n={}

for i in range(len(s)):
    n[i] = pd.json_normalize(s.loc[s.index[i]]['i2']).reset_index(inplace=False, drop=False)  
    n[i]['i1'] = s.iloc[i].i1 + ' ' + n[i]['item']
    def insert_row(i, d1, d2): return d1.iloc[:i, ].append(d2)
    for i in n:
        if i == 0:
            x = insert_row(s.iloc[i].name, df, n[i])
        elif i == 1:
            x = insert_row(s.iloc[i].name+1+n[i]['index'].count()+1, x, n[i]) 
            y = x.append(df.iloc[s.iloc[i].name+1:, ])

分解列
i2
上的数据帧,然后使用
str
访问器从列
i2
检索与键
项相关的值,然后使用
loc
索引将列
i2
中的值更新为
1
,并将
i1
中的字符串与检索到的项目值连接起来

df2 = df.explode('i2', ignore_index=True)
s = df2['i2'].str['item']
df2.loc[s.notna(), 'i2'] =  1
df2.loc[s.notna(), 'i1'] += ' ' + s


简单而令人敬畏,非常感谢!那太棒了。非常优雅。我发布了一个更新,因为我意识到对于一些嵌套的单元格,还有一个额外的项目我需要提取并放入另一个列中-我正在尝试将其取出,但我做不到。也许你能帮上忙?哦,天太晚了,只是有点像这样,不是吗?d=df2['i2'].str['id_2']是的!您可以使用
str
访问器检索字典中的任何值
    id                                                  i1 i2
0    1                                    How old are you?  0
1    2            Over the last month have you felt happy?  1
2    2              Over the last month have you felt sad?  1
3    2            Over the last month have you felt angry?  1
4    2       Over the last month have you felt frustrated?  1
5    3                                  Do you live alone?  0
6    4     In the last week have you had trouble sleeping?  1
7    4  In the last week have you had changes in appetite?  1
8    4          In the last week have you had mood swings?  1
9    4     In the last week have you had trouble relaxing?  1
10   5                   When did you last visit a doctor?  0