Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python pandas-在主df返回的列中添加唯一ID到数据帧列表中存储的已处理dfs中_Python_Json_Pandas_Loops - Fatal编程技术网

python pandas-在主df返回的列中添加唯一ID到数据帧列表中存储的已处理dfs中

python pandas-在主df返回的列中添加唯一ID到数据帧列表中存储的已处理dfs中,python,json,pandas,loops,Python,Json,Pandas,Loops,我有一个单独的df,每行包含多个需要读取和规范化的json字符串 我可以读取json信息,并通过将每一行存储为列表中的新数据帧来规范化列,我已经用下面的代码完成了这一操作 但是,我需要在原始df中附加原始唯一Id(即“Id”:[9clpa','g659am'])——这在我当前的代码中丢失 预期的输出是每个Id的数据帧列表,其中包括分解的json信息,还有一个包含Id的附加列(将对最终df的每一行重复) 我希望这是有道理的,任何建议都是非常受欢迎的。非常感谢 数据帧 df = pd.DataFra

我有一个单独的df,每行包含多个需要读取和规范化的json字符串

我可以读取json信息,并通过将每一行存储为列表中的新数据帧来规范化列,我已经用下面的代码完成了这一操作

但是,我需要在原始df中附加原始唯一Id(即“Id”:[9clpa','g659am'])——这在我当前的代码中丢失

预期的输出是每个Id的数据帧列表,其中包括分解的json信息,还有一个包含Id的附加列(将对最终df的每一行重复)

我希望这是有道理的,任何建议都是非常受欢迎的。非常感谢

数据帧

df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
out={}
for i in range(len(df)):
    out[i] = pd.read_json(df.i2[i])
    out[i] = pd.json_normalize(out[i].q)
当前代码

df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
out={}
for i in range(len(df)):
    out[i] = pd.read_json(df.i2[i])
    out[i] = pd.json_normalize(out[i].q)
预期产出

pd.DataFrame(data={'id': ['9clpa','9clpa'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
pd.DataFrame(data={'id': ['g659am','g659am'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
您可以添加一个lambda函数,该函数将“id”的值分配给新的df

编辑:您可以在列1中添加“id”列的位置,并定义在创建数据帧时希望它出现的位置

输出数据帧:


您可以使用
.json\u normalize
(这里的文档:)


(from)

在规范化列之后,将
id
分配给数据帧时,您刚好丢失了:

out={}
for i in range(len(df)):
    out[i] = pd.read_json(df.i2[i])
    out[i] = pd.json_normalize(out[i].q)
    out[i]['id'] = df.id[i]
    out[i] = out[i].loc[:, ['id','qi','answers']]
输出:

>>> out[0]
      id  qi                                                                                                                                                                                                                     answers
0  9clpa  01                                                                                                [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'G', 'value': '3'}, {'answer': 'V', 'value': '4'}]
1  9clpa  02  [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'A', 'value': '3'}, {'answer': 'B', 'value': '4'}, {'answer': 'G', 'value': '5'}, {'answer': 'NC', 'value': '6'}, {'answer': 'O', 'value': '7'}]

难道你不认为与其拥有多个数据帧,不如将第一个数据帧扩展成多行?您的
df
将有4行,2行ID为'9c…',2行ID为'g6…'。BeChillerToo感谢您的评论。我认为这并不重要,因为在第二个命令中,我只将pd.json_normalize应用于一个列,也就是当我从原始df中丢失id列时。但事实上,它可以通过其他方式让事情变得更简单扫描粘贴原始JSON数据?作为pandas中的一般规则:如果在行上进行迭代,则说明您做错了。我知道,但这解决了问题。这个答案是为了说明在进近过程中遗漏了什么。@AyushiRanjan谢谢!现在可以了-您知道我是否可以指定添加新id列的位置?i、 e.在第一列中,我编辑了我的答案,以说明如何指定“id”列的位置。如果有帮助,请查看!!迭代数据帧的行通常不是做我知道的事情的正确方法,但它解决了OP的问题,答案是显示他在方法中遗漏了什么,他没有得到预期的结果。通常不鼓励只链接的答案。