嵌套json,包含数据帧的json字符串列表
我想将以下嵌套json,包含数据帧的json字符串列表,json,python-3.x,pandas,dataframe,Json,Python 3.x,Pandas,Dataframe,我想将以下json字符串放入数据帧: jsonstr = { "id": "12345", "ename": "A4.txt", "Zoom1": { "Zoom1_res": [ { "code": "A1", "x": 3211, &q
json
字符串放入数据帧:
jsonstr = {
"id": "12345",
"ename": "A4.txt",
"Zoom1": {
"Zoom1_res": [
{
"code": "A1",
"x": 3211,
"y": 677,
"part": "11",
"lace": "29",
"name": "COVER"
},
{
"code": "A4",
"x": 3492,
"y": 1109,
"part": "10",
"lace": "19",
"name": "ARMOUR"
}
]
},
"iSize": {
"width": 4608,
"height": 3456
},
"Action": {
"AA": {
"detect": [
{
"class": "aa",
"prob": 0.92,
"Box": {
"x0": 4406,
"y0": 670,
"x1": 4558,
"y1": 760
}
},
{
"class": "aa",
"prob": 0.92,
"Box": {
"x0": 3762,
"y0": 655,
"x1": 3913,
"y1": 747
}
}
]
}
}
}
按以下方式使用json\u read
:
df =pd.read_json(jsonstr)
返回
id ename Zoom1 \
Zoom1_res 12345 A4.txt [{'code': 'A1', 'x': 3211, 'y': 677, 'part': '...
width 12345 A4.txt NaN
height 12345 A4.txt NaN
AA 12345 A4.txt NaN
iSize Action
Zoom1_res NaN NaN
width 4608.0 NaN
height 3456.0 NaN
AA NaN {'detect': [{'class': 'aa', 'prob': 0.92, 'Box...
及
返回错误
AttributeError: 'float' object has no attribute 'values'
所以,我认为
from ast import literal_eval
pd.json_normalize(df['Action'].apply(lambda x: literal_eval(x)["detect"]).explode())
可能会解决问题,但该列中有nan
,因此即使这样也不起作用
我真正想要的是:
在最好的世界里:id,ename,code,x,y,x0,y0,x1,y1
所有其他数据对我来说都没有价值
感谢您的洞察力 请看,您的JSON嵌套在多个级别上 1.Creting子数据帧 2.将数据转移到NaN 根据我的理解,gBOX和BOX是相同的属性,因此您可以通过这种方式合并它们,您可以使用它们并获得所需的数据
df3 = df1.apply(lambda x: pd.Series(x.dropna().values), axis=1)
df3.columns = ['class','prob','x0','y0','x1','y1','id','ename']
3.根据您的数据获取所需的列
这真是太好了!不过有一条评论是:
,left\u index=True,right\u index=True
对我不起作用。非常感谢。酷,现在编辑
df1 = pd.json_normalize(jsonstr, record_path=['Action','AA','detect'], meta=['id','ename'])
df2 = pd.json_normalize(jsonstr, record_path=['Zoom1','Zoom1_res'], meta=['id','ename'])
df3 = df1.apply(lambda x: pd.Series(x.dropna().values), axis=1)
df3.columns = ['class','prob','x0','y0','x1','y1','id','ename']
df4 = pd.merge(df3, df2, on=['id','ename'])
df4 = df4.iloc[:,[6,7,8,9,10,2,3,4,5]]