Python 无法从嵌套的JSON文件创建数据帧

Python 无法从嵌套的JSON文件创建数据帧,python,json,pandas,Python,Json,Pandas,我试图从以下嵌套的json文件中检索val1和val2值,以构建包含两列的pandas数据框:val1和val2: { 'start': '2015-10-01 00:00', 'end': '2015-10-01 01:00', 'records': { 'val1': [ 1, 2, 3, 4,

我试图从以下嵌套的
json
文件中检索
val1
val2
值,以构建包含两列的pandas数据框:
val1
val2

{
 'start': '2015-10-01 00:00',
 'end': '2015-10-01 01:00',
 'records': 
     {
        'val1': 
             [
                1,
                2,
                3,
                4,
                5
             ],
         'val2':
             [
                0.1,
                0.5,
                0.2,
                0.1,
                0.0
             ],
         'val3': 'abc'
      }
}
我就是这么做的:

import json
from pandas.io.json import json_normalize

with open(json_file) as data_file:    
    data = json.load(data_file)  

df = json_normalize(data, 'records', ['val1', 'val2'], record_prefix='records_', errors='ignore')
但是,我得到了以下输出:

    records_0   val1  val2
0   val1        NaN   NaN
1   val2        NaN   NaN
2   val3        NaN   NaN
预期产出:

val1   val2
1      0.1
2      0.5
3      0.2
4      0.1
5      0.0

将json放入变量或使用
json.load
:然后使用
json\u normalize

[这里是示例和代码]

import pandas as pd

json = {'start': '2015-10-01 00:00','end': '2015-10-01 01:00','records': {'val1': [1,2,3,4,5],'val2':[0.1,0.5,0.2,0.1,0.0],'val3': 'abc'}}

df = pd.json_normalize(json)

df.columns = df.columns.map(lambda x: x.split(".")[-1])
for column in df.columns:
if column != 'val1' and column != 'val2':
    df = df.drop([column], axis = 1)

如果您只想剩下两列,那么只需删除另一列并决定要保留哪一列

[这里是示例和代码]

import pandas as pd

json = {'start': '2015-10-01 00:00','end': '2015-10-01 01:00','records': {'val1': [1,2,3,4,5],'val2':[0.1,0.5,0.2,0.1,0.0],'val3': 'abc'}}

df = pd.json_normalize(json)

df.columns = df.columns.map(lambda x: x.split(".")[-1])
for column in df.columns:
if column != 'val1' and column != 'val2':
    df = df.drop([column], axis = 1)

您可以将列表定义为
['val1','val2']
并初始化数据帧,并通过使用for循环(例如)填充此新数据帧的元素

import json
import pandas as pd

l=['val1','val2']
df = pd.DataFrame(columns=l)
with open('myfile.json') as data_file:    
    data = json.load(data_file) 

for i in l:
    df[i]=data['records'][i]

df

   val1  val2
0     1   0.1
1     2   0.5
2     3   0.2
3     4   0.1
4     5   0.0

你可以系统地拿出你想要的

js = {'start': '2015-10-01 00:00',
 'end': '2015-10-01 01:00',
 'records': {'val1': [1, 2, 3, 4, 5],
  'val2': [0.1, 0.5, 0.2, 0.1, 0.0],
  'val3': 'abc'}}

(pd.json_normalize(js["records"],"val1")
 .rename(columns={0:"val1"})
 .join(pd.json_normalize(js["records"],"val2"))
 .rename(columns={0:"val2"})
)

瓦尔1 瓦尔2 0 1. 0.1 1. 2. 0.5 2. 3. 0.2 3. 4. 0.1 4. 5. 0
函数需要一个字典数组


就您的示例而言,使用json_normalize是不合适的,因为此方法假定基本容器是一个数组

您可以使用另一种方法:

with open(json_file) as data_file:    
    data = json.load(data_file)
  
pandas.DataFrame.from_dict( data= data["records"]  )