Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从URL加载JSON_Python_Json_Pandas_Google Cloud Platform - Fatal编程技术网

Python 从URL加载JSON

Python 从URL加载JSON,python,json,pandas,google-cloud-platform,Python,Json,Pandas,Google Cloud Platform,我正在为我的学生准备学习材料。为了方便起见,我想从URL访问数据,而不是要求他们提前下载。在本例中,我尝试从快速绘制访问!谷歌数据集 下面是访问远程存储的数据并将结果注释掉的工作示例: import pandas as pd import os import json from glob import glob # Convert top row to one dict top_row_dict = lambda in_df: list(in_df.head(1).T.to_dict().va

我正在为我的学生准备学习材料。为了方便起见,我想从URL访问数据,而不是要求他们提前下载。在本例中,我尝试从快速绘制访问!谷歌数据集

下面是访问远程存储的数据并将结果注释掉的工作示例:

import pandas as pd
import os
import json
from glob import glob

# Convert top row to one dict
top_row_dict = lambda in_df: list(in_df.head(1).T.to_dict().values())[0]
# Load file from computer
base_dir = os.path.join('input', 'quickdraw_simplified')
obj_files = glob(os.path.join(base_dir, '*.ndjson'))
print(obj_files[0])
# input\quickdraw_simplified\full_simplified_bird.ndjson

c_json = pd.read_json(obj_files[0], lines = True, chunksize = 1)
# <pandas.io.json._json.JsonReader at 0x158ae631f10>

f_row = next(c_json)
# word  countrycode     timestamp   recognized  key_id  drawing
# 0     bird    US  2017-03-09 00:28:55.637750+00:00    True    4926006882205696    [[[0, 11, 23, 50, 72, 96, 97, 132, 158, 224, 2...

f_dict = top_row_dict(f_row)
# {'word': 'bird',
#  'countrycode': 'US',
#  'timestamp': Timestamp('2017-03-09 00:28:55.637750+0000', tz='UTC'),
#  'recognized': True,
#  'key_id': 4926006882205696,
#  'drawing': [[[0, 11, 23, 50, 72, 96, 97, 132, 158, 224, 255],
#    [22, 9, 2, 0, 26, 45, 71, 40, 27, 10, 9]]]}
将熊猫作为pd导入
导入操作系统
导入json
从全局导入全局
#将顶行转换为一个dict
top_row_dict=列表中的lambda(在头(1).T.到_dict().values())[0]
#从计算机加载文件
base\u dir=os.path.join('input','quickdraw\u simplified')
obj_files=glob(os.path.join(base_dir,'.*.ndjson'))
打印(obj_文件[0])
#input\quickdraw\u simplified\full\u simplified\u bird.ndjson
c_json=pd.read_json(obj_文件[0],line=True,chunksize=1)
# 
f_row=next(c_json)
#word countrycode时间戳识别的关键字id绘图
#0伯德美国2017-03-09 00:28:55.637750+00:00真实值4926006882205696[[0,11,23,50,72,96,97,132,158,224,2]。。。
f_dict=顶行_dict(f_行)
#{'word':'bird',
#“国家代码”:“美国”,
#“时间戳”:时间戳('2017-03-09 00:28:55.637750+0000',tz='UTC'),
#“公认”:正确,
#“密钥id”:4926006882205696,
#“图纸”:[0,11,23,50,72,96,97,132,158,224,255],
#    [22, 9, 2, 0, 26, 45, 71, 40, 27, 10, 9]]]}
但是,当我尝试使用相同的方法时,它失败了:

import pandas as pd
import json

top_row_dict = lambda in_df: list(in_df.head(1).T.to_dict().values())[0]

url = 'https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/simplified/bird.ndjson'
# Load dataset
c_json = pd.read_json(url, lines = True, chunksize = 1)
# <pandas.io.json._json.JsonReader at 0x24980a20a90>
f_row = next(c_json)
# __
f_dict = top_row_dict(f_row)
# IndexError: list index out of range
将熊猫作为pd导入
导入json
top_row_dict=列表中的lambda(在头(1).T.到_dict().values())[0]
url='1〕https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/simplified/bird.ndjson'
#加载数据集
c_json=pd.read_json(url,lines=True,chunksize=1)
# 
f_row=next(c_json)
# __
f_dict=顶行_dict(f_行)
#索引器:列表索引超出范围

您尝试使用的URL需要登录(因为它链接到云控制台)

但是,数据集存储在一个可公开访问的Google云存储桶中

这意味着您可以使用包直接从bucket加载文件

比如:

从google.cloud导入存储
client=storage.client()
bucket=client.get\u bucket('quickdraw\u dataset'))
blob=bucket.get\u blob('full/simplified/bird.ndjson')
c_json=pd.read_json(blob,lines=True,chunksize=1)
...

不幸的是,它仍在请求凭据。:(DefaultCredentialsError:无法自动确定凭据。请设置GOOGLE\u应用程序\u凭据或显式创建凭据,然后重新运行应用程序。有关详细信息,请参阅