Python 无法从嵌套的JSON文件创建数据帧
我试图从以下嵌套的Python 无法从嵌套的JSON文件创建数据帧,python,json,pandas,Python,Json,Pandas,我试图从以下嵌套的json文件中检索val1和val2值,以构建包含两列的pandas数据框:val1和val2: { 'start': '2015-10-01 00:00', 'end': '2015-10-01 01:00', 'records': { 'val1': [ 1, 2, 3, 4,
json
文件中检索val1
和val2
值,以构建包含两列的pandas数据框:val1
和val2
:
{
'start': '2015-10-01 00:00',
'end': '2015-10-01 01:00',
'records':
{
'val1':
[
1,
2,
3,
4,
5
],
'val2':
[
0.1,
0.5,
0.2,
0.1,
0.0
],
'val3': 'abc'
}
}
我就是这么做的:
import json
from pandas.io.json import json_normalize
with open(json_file) as data_file:
data = json.load(data_file)
df = json_normalize(data, 'records', ['val1', 'val2'], record_prefix='records_', errors='ignore')
但是,我得到了以下输出:
records_0 val1 val2
0 val1 NaN NaN
1 val2 NaN NaN
2 val3 NaN NaN
预期产出:
val1 val2
1 0.1
2 0.5
3 0.2
4 0.1
5 0.0
将json放入变量或使用
json.load
:然后使用json\u normalize
[这里是示例和代码]
import pandas as pd
json = {'start': '2015-10-01 00:00','end': '2015-10-01 01:00','records': {'val1': [1,2,3,4,5],'val2':[0.1,0.5,0.2,0.1,0.0],'val3': 'abc'}}
df = pd.json_normalize(json)
df.columns = df.columns.map(lambda x: x.split(".")[-1])
for column in df.columns:
if column != 'val1' and column != 'val2':
df = df.drop([column], axis = 1)
如果您只想剩下两列,那么只需删除另一列并决定要保留哪一列
[这里是示例和代码]
import pandas as pd
json = {'start': '2015-10-01 00:00','end': '2015-10-01 01:00','records': {'val1': [1,2,3,4,5],'val2':[0.1,0.5,0.2,0.1,0.0],'val3': 'abc'}}
df = pd.json_normalize(json)
df.columns = df.columns.map(lambda x: x.split(".")[-1])
for column in df.columns:
if column != 'val1' and column != 'val2':
df = df.drop([column], axis = 1)
您可以将列表定义为
['val1','val2']
并初始化数据帧,并通过使用for循环(例如)填充此新数据帧的元素
import json
import pandas as pd
l=['val1','val2']
df = pd.DataFrame(columns=l)
with open('myfile.json') as data_file:
data = json.load(data_file)
for i in l:
df[i]=data['records'][i]
df
val1 val2
0 1 0.1
1 2 0.5
2 3 0.2
3 4 0.1
4 5 0.0
你可以系统地拿出你想要的
js = {'start': '2015-10-01 00:00',
'end': '2015-10-01 01:00',
'records': {'val1': [1, 2, 3, 4, 5],
'val2': [0.1, 0.5, 0.2, 0.1, 0.0],
'val3': 'abc'}}
(pd.json_normalize(js["records"],"val1")
.rename(columns={0:"val1"})
.join(pd.json_normalize(js["records"],"val2"))
.rename(columns={0:"val2"})
)
瓦尔1
瓦尔2
0
1.
0.1
1.
2.
0.5
2.
3.
0.2
3.
4.
0.1
4.
5.
0
函数需要一个字典数组
就您的示例而言,使用json_normalize是不合适的,因为此方法假定基本容器是一个数组 您可以使用另一种方法:
with open(json_file) as data_file:
data = json.load(data_file)
pandas.DataFrame.from_dict( data= data["records"] )