Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将json嵌套数据加载到bigquery中_Python_Json_Google Cloud Platform_Google Bigquery - Fatal编程技术网

Python 如何将json嵌套数据加载到bigquery中

Python 如何将json嵌套数据加载到bigquery中,python,json,google-cloud-platform,google-bigquery,Python,Json,Google Cloud Platform,Google Bigquery,我试图将json数据从API加载到GCP上的bigquery表中,但是我遇到了一个问题,json数据似乎缺少一个方括号,因此它得到了一个错误“重复记录,名为trip_update,添加到数组外部。”}]。我不知道怎么做 以下是数据示例: { "header": { "gtfs_realtime_version": "1.0", "timestamp": 1607630971

我试图将json数据从API加载到GCP上的bigquery表中,但是我遇到了一个问题,json数据似乎缺少一个方括号,因此它得到了一个错误“重复记录,名为trip_update,添加到数组外部。”}]。我不知道怎么做 以下是数据示例:

{
    "header": {
        "gtfs_realtime_version": "1.0",
        "timestamp": 1607630971
    },
    "entity": [
        {
            "id": "65.5.17-120-cm1-1.18.O",
            "trip_update": {
                "trip": {
                    "trip_id": "65.5.17-120-cm1-1.18.O",
                    "start_time": "18:00:00",
                    "start_date": "20201210",
                    "schedule_relationship": "SCHEDULED",
                    "route_id": "17-120-cm1-1"
                },
                "stop_time_update": [
                    {
                        "stop_sequence": 1,
                        "departure": {
                            "delay": 0
                        },
                        "stop_id": "8220B1351201",
                        "schedule_relationship": "SCHEDULED"
                    },
                    {
                        "stop_sequence": 23,
                        "arrival": {
                            "delay": 2340
                        },
                        "departure": {
                            "delay": 2340
                        },
                        "stop_id": "8260B1025301",
                        "schedule_relationship": "SCHEDULED"
                    }
                ]
            }
        }
    ]
}
下面是一个模式和代码: 模式

功能(遵循谷歌指南)


您的架构定义是错误的
trip\u update
不是重复的结构,而是可为空的记录(或不可为空,但不重复)

to BigQuery的一个限制是它不支持JSON中的映射或字典。
我认为“trip\u update”和“trip”字段必须包含一个值数组(用方括号表示),与“stop\u time\u update”相同

我不确定这是否足以完美地加载您的数据。
您的示例行在JSON行的中间有许多换行符,当您从JSON文件加载数据时,行必须是换行分隔的。BigQuery要求以换行符分隔的JSON文件每行包含一条记录(解析器试图将每行解释为单独的JSON行)(。

您的JSON数据文件应该是什么样子。

是的,在trip_更新后,JSON数据缺少一个方括号,但它是我从公共API()请求的原始格式。因此,我正在寻找能够读取给定格式的解决方案
    [
        { "name":"header",
           "type": "record",
           "fields": [
                {   "name":"gtfs_realtime_version",
                    "type": "string",
                    "description": "version of speed specification"
                },
                { "name": "timestamp",
                    "type": "integer",
                    "description": "The moment where this dataset was generated on server e.g. 1593102976"
                }
            ]

        },
        {"name":"entity",
            "type": "record",
            "mode": "REPEATED",
            "description": "Multiple entities can be included in the feed",
            "fields": [
                {"name":"id",
                    "type": "string",
                    "description": "unique identifier for the entity"
                },
                {"name": "trip_update",
                     "type": "struct",
                     "mode": "REPEATED",
                    "description": "Data about the realtime departure delays of a trip. At least one of the fields trip_update, vehicle, or alert must be provided - all these fields cannot be empty.",
                    "fields": [
                         { "name":"trip",
                            "type": "record",
                            "mode": "REPEATED",
                            "fields": [
                                {"name": "trip_id",
                                    "type": "string",
                                    "description": "selects which GTFS entity (trip) will be affected"
                                },
                                { "name":"start_time",
                                    "type": "string",
                                    "description": "The initially scheduled start time of this trip instance 13:30:00"
                                },
                                { "name":"start_date",
                                    "type": "string",
                                    "description": "The start date of this trip instance in YYYYMMDD format. Whether start_date is required depends on the type of trip: e.g. 20200625"
                                },
                                { "name":"schedule_relationship",
                                    "type": "string",
                                    "description": "The relation between this trip and the static schedule e.g. SCHEDULED"
                                },
                                { "name":"route_id",
                                    "type": "string",
                                    "description": "The route_id from the GTFS feed that this selector refers to e.g. 10-263-e16-1"
                                }
                            ]
                        }
                    ]
                },
                { "name":"stop_time_update",
                    "type": "record",
                    "mode": "REPEATED",
                    "description": "Updates to StopTimes for the trip (both future, i.e., predictions, and in some cases, past ones, i.e., those that already happened). The updates must be sorted by stop_sequence, and apply for all the following stops of the trip up to the next specified stop_time_update. At least one stop_time_update must be provided for the trip unless the trip.schedule_relationship is CANCELED - if the trip is canceled, no stop_time_updates need to be provided.",
                    "fields": [
                        {"name":"stop_sequence",
                            "type": "string",
                            "description": "Must be the same as in stop_times.txt in the corresponding GTFS feed e.g 3"
                        },
                        { "name":"arrival",
                            "type": "record",
                            "mode": "REPEATED",
                            "fields": [
                                { "name":"delay",
                                    "type": "string",
                                    "description": "Delay (in seconds) can be positive (meaning that the vehicle is late) or negative (meaning that the vehicle is ahead of schedule). Delay of 0 means that the vehicle is exactly on time e.g 5"
                                }
                            ]
                        },
                        { "name": "departure",
                            "type": "record",
                            "mode": "REPEATED",
                            "fields": [
                                { "name":"delay",
                                    "type": "integer"
                                }
                            ]
                        },
                        {  "name":"stop_id",
                            "type": "string",
                            "description": "Must be the same as in stops.txt in the corresponding GTFS feed e.g. 8430B2552301"
                        },
                        {"name":"schedule_relationship",
                            "type": "string",
                            "description": "The relation between this StopTime and the static schedule e.g. SCHEDULED , SKIPPED or NO_DATA"
                        }
                    ]
                }
            ]
        }
    ]
def _insert_into_bigquery(bucket_name, file_name):
    blob = CS.get_bucket(bucket_name).blob(file_name)
    row = json.loads(blob.download_as_string())
    table = BQ.dataset(BQ_DATASET).table(BQ_TABLE)
    errors = BQ.insert_rows_json(table,
                                 json_rows=row,
                                 ignore_unknown_values=True,
                                 retry=retry.Retry(deadline=30))
    if errors != []:
        raise BigQueryError(errors)
                    {"name": "trip_update",
                     "type": "record",
                     "mode": "NULLABLE",
"trip_update": [
    {
    "trip": [
        {
            "trip_id