Python 熊猫读取带有NaN项的嵌套json
我正试图通过以下方法读取具有嵌套字典的json,问题是我的一些嵌套列表/字典是NaN,因此如果我尝试调用Python 熊猫读取带有NaN项的嵌套json,python,pandas,dictionary,Python,Pandas,Dictionary,我正试图通过以下方法读取具有嵌套字典的json,问题是我的一些嵌套列表/字典是NaN,因此如果我尝试调用normalize函数,我会得到一个找不到关键错误,因为它只存在于字典较高级别的某些元素中 以下是我的数据: q Out[235]: [{u'Code': u'GE', u'datetime': u'2011-11-14T19:30:03-05:00[US/Eastern]'}, {u'Code': u'PP', u'datetime': u'2012-21-14T18:50-05
normalize
函数,我会得到一个找不到关键错误
,因为它只存在于字典较高级别的某些元素中
以下是我的数据:
q
Out[235]:
[{u'Code': u'GE',
u'datetime': u'2011-11-14T19:30:03-05:00[US/Eastern]'},
{u'Code': u'PP',
u'datetime': u'2012-21-14T18:50-05:00[US/Eastern]'},
{u'Code': u'IO',
u'Summary': [{u'prod': u'book',
u'num': 81.04,
u'devil': 17},
{u'prod': u'game',
u'num': 191.5,
u'devil': 10},
{u'prod': u'desk',
u'num': 55.5,
u'devil': -6},
{u'angel': u'ipo',
u'num': 503.0,
u'devil': 1}],
u'datetime': u'2013-10-14T16:30-05:00[US/Eastern]'},
{u'Code': u'BI',
u'datetime': u'2014-11-14T12:30-05:00[US/Eastern]'},
{u'Code': u'EZ',
u'datetime': u'2015-12-14T10:00-05:00[US/Eastern]'},
{u'Code': u'JC',
u'datetime': u'2016-10-14T08:30:01-05:00[US/Eastern]'},
{u'Code': u'WX',
u'Summary': [{u'angel': u'yut',
u'num': 0,
u'prod': u'read',
u'devil': 0.0},
{u'angel': u'fgf',
u'prod': u'fart',
u'devil': 0.0},
{u'prod': u'red',
u'num': 673,
u'angel': u'deft',
u'devil': 0},
{ u'devil': 0,
u'prod': u'dog'},
{u'angel': u'hut',
u'devil': 99}],
u'datetime': u'2017-10-13T05:00:02-05:00[US/Eastern]'}]
我可以在这样的数据框中查看它:
pd.DataFrame(q)
Out[229]:
Code Summary datetime
0 GE NaN 2011-11-11T19:30:03-05:00[US/Eastern]
1 PP NaN 2012-12-25T18:50-05:00[US/Eastern]
2 IO [{u'prod': u'book', u'angel': u'I... 2013-11-04T16:30-05:00[US/Eastern]
3 BI NaN 2014-12-14T08:30:01-05:00[US/Eastern]
4 JC NaN 2016-11-14T04:30-05:00[US/Eastern]
5 WX [{u'prod': u'orange', u'devil': -2, u's... 2017-10-13T03:30:08-05:00[US/Eastern]
如前所述,运行pd.io.json.json_normalize(q,'Summary',['code','datetime'])
会导致KeyError:'Summary'
我怎样才能避开这件事?理想情况下,我只希望在不存在NaN单元格的情况下使用NaN单元格值 IIUC:
In [94]: (json_normalize([x for x in q if x.get('Summary')],
'Summary',
['Code', 'datetime'])
...: .append(pd.DataFrame([x for x in q if not x.get('Summary')])))
...:
Out[94]:
Code angel datetime devil num prod
0 IO NaN 2013-10-14T16:30-05:00[US/Eastern] 17.0 81.04 book
1 IO NaN 2013-10-14T16:30-05:00[US/Eastern] 10.0 191.50 game
2 IO NaN 2013-10-14T16:30-05:00[US/Eastern] -6.0 55.50 desk
3 IO ipo 2013-10-14T16:30-05:00[US/Eastern] 1.0 503.00 NaN
4 WX yut 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 0.00 read
5 WX fgf 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 NaN fart
6 WX deft 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 673.00 red
7 WX NaN 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 NaN dog
8 WX hut 2017-10-13T05:00:02-05:00[US/Eastern] 99.0 NaN NaN
0 GE NaN 2011-11-14T19:30:03-05:00[US/Eastern] NaN NaN NaN
1 PP NaN 2012-21-14T18:50-05:00[US/Eastern] NaN NaN NaN
2 BI NaN 2014-11-14T12:30-05:00[US/Eastern] NaN NaN NaN
3 EZ NaN 2015-12-14T10:00-05:00[US/Eastern] NaN NaN NaN
4 JC NaN 2016-10-14T08:30:01-05:00[US/Eastern] NaN NaN NaN
或者使用pd.concat()
:
@MaxU抱歉刚刚注意到了打字错误,我编辑了它。如果我的示例数据有问题,请告诉我。@MaxU您现在看到了吗?是的,现在看起来好多了;-)你能提供你想要的数据集吗?
In [95]: pd.concat([json_normalize([x for x in q if x.get('Summary')],
...: 'Summary',
...: ['Code', 'datetime']),
...: pd.DataFrame([x for x in q if not x.get('Summary')])],
...: ignore_index=True)
...:
Out[95]:
Code angel datetime devil num prod
0 IO NaN 2013-10-14T16:30-05:00[US/Eastern] 17.0 81.04 book
1 IO NaN 2013-10-14T16:30-05:00[US/Eastern] 10.0 191.50 game
2 IO NaN 2013-10-14T16:30-05:00[US/Eastern] -6.0 55.50 desk
3 IO ipo 2013-10-14T16:30-05:00[US/Eastern] 1.0 503.00 NaN
4 WX yut 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 0.00 read
5 WX fgf 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 NaN fart
6 WX deft 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 673.00 red
7 WX NaN 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 NaN dog
8 WX hut 2017-10-13T05:00:02-05:00[US/Eastern] 99.0 NaN NaN
9 GE NaN 2011-11-14T19:30:03-05:00[US/Eastern] NaN NaN NaN
10 PP NaN 2012-21-14T18:50-05:00[US/Eastern] NaN NaN NaN
11 BI NaN 2014-11-14T12:30-05:00[US/Eastern] NaN NaN NaN
12 EZ NaN 2015-12-14T10:00-05:00[US/Eastern] NaN NaN NaN
13 JC NaN 2016-10-14T08:30:01-05:00[US/Eastern] NaN NaN NaN