Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/amazon-s3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Amazon web services 如何关联包含JSON的数组_Amazon Web Services_Amazon S3_Pyspark_Aws Glue - Fatal编程技术网

Amazon web services 如何关联包含JSON的数组

Amazon web services 如何关联包含JSON的数组,amazon-web-services,amazon-s3,pyspark,aws-glue,Amazon Web Services,Amazon S3,Pyspark,Aws Glue,我正在使用AWS胶水读取S3上包含JSON的数据文件。这是一个JSON,数据包含在数组中。我尝试过使用relationalize函数,但它在数组上不起作用。它确实适用于嵌套JSON,但这不是输入的数据格式 有没有办法将JSON与其中的数组关联起来 输入数据: { "ID":"1234", "territory":"US", "imgList":[ { "type":"box" "locale":"en-US"

我正在使用AWS胶水读取S3上包含JSON的数据文件。这是一个JSON,数据包含在数组中。我尝试过使用relationalize函数,但它在数组上不起作用。它确实适用于嵌套JSON,但这不是输入的数据格式

有没有办法将JSON与其中的数组关联起来

输入数据:

{
    "ID":"1234",
    "territory":"US",
    "imgList":[
        {
            "type":"box"
            "locale":"en-US"
            "url":"boxart/url.jpg"
        },
        {
            "type":"square"
            "locale":"en-US"
            "url":"square/url.jpg"
        }
    ]
}
代码:

输出:

+----+----------+--------+
|ID  |territory |imgList |
+----+----------+--------+
|1234|       US |       1|
+----+----------+--------+
期望输出:

+----+----------+-------------+---------------+---------------+
|ID  |territory |imgList.type |imgList.locale |imgList.url    |
+----+----------+-------------+---------------+---------------+
|1234|       US |       box   |         en-US |boxart/url.jpg |
+----+----------+-------------+---------------+---------------+
|1234|       US |       square|         en-US |square/url.jpg |
+----+----------+-------------+---------------+---------------+

Relationalize为JSON文档中的每个数组创建动态框架。因此,您只需获取它并与根表连接:

dfc = Relationalize.apply(frame = datasource0, staging_path = glue_temp_storage, name = "root", transformation_ctx = "dfc")
root_df = dfc.select('root')
imgList_df = dfc.select('root_imgList')

df = Join.apply(root_df, imgList_df, 'imgList', 'id')
df.toDF().show()

Relationalize为JSON文档中的每个数组创建动态框架。因此,您只需获取它并与根表连接:

dfc = Relationalize.apply(frame = datasource0, staging_path = glue_temp_storage, name = "root", transformation_ctx = "dfc")
root_df = dfc.select('root')
imgList_df = dfc.select('root_imgList')

df = Join.apply(root_df, imgList_df, 'imgList', 'id')
df.toDF().show()