Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用带有动态字段的Python | NestedJson运行SPARK作业时出错_Apache Spark_Pyspark_Apache Spark Sql_Aws Glue - Fatal编程技术网

Apache spark 使用带有动态字段的Python | NestedJson运行SPARK作业时出错

Apache spark 使用带有动态字段的Python | NestedJson运行SPARK作业时出错,apache-spark,pyspark,apache-spark-sql,aws-glue,Apache Spark,Pyspark,Apache Spark Sql,Aws Glue,我在AmazonS3上有一个inputNestedJson。这个json有一个jsonObject列表,每个jsonObject都有一个动态字段“Extension”。它有时可以是列表,有时也可以是地图。我需要忽略这个字段,并创建对应于其他字段的模式。目前我无法这样做,在为数据帧记录应用扁平化时,我遇到了错误 一旦我获得了正确的数据,我需要将其注入AWS elastic,以便它可以用于查询 我的问题-->是否有任何方法可以忽略动态字段并仅从相关字段创建数据帧? 在jackson中,我们可以在字段

我在AmazonS3上有一个inputNestedJson。这个json有一个jsonObject列表,每个jsonObject都有一个动态字段“Extension”。它有时可以是列表,有时也可以是地图。我需要忽略这个字段,并创建对应于其他字段的模式。目前我无法这样做,在为数据帧记录应用扁平化时,我遇到了错误

一旦我获得了正确的数据,我需要将其注入AWS elastic,以便它可以用于查询

我的问题-->是否有任何方法可以忽略动态字段并仅从相关字段创建数据帧? 在jackson中,我们可以在字段上应用@JsonIgnore,以便在序列化/反序列化时不读取这些字段

我尝试只使用3个字段创建一个新的数据框,但结果是一行

ndf = df.select("Records.LEI", "Records.Entity","Records.Registration").show(truncate = False)
结果:

+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|LEI                                             |Entity                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |Registration                                                                                                                                                                                                                                                                                                       |
+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[[001GPB6A9XPE8XJICC14], [004L5FPTUREIWK9T2N63]]|[[[FUND], [ACTIVE], [, [Boston], [US], [245 Summer Street], [02210], [US-MA]], [, [BOSTON], [US], [245 SUMMER STREET], [02110], [US-MA]], [[8888], [OTHER]], [US-MA], [FIDELITY ADVISOR SERIES I - Fidelity Advisor Leveraged Company Stock Fund], [[S000005113], [RA000665]]], [, [ACTIVE], [[888 7th Avenue], [New York], [US], [22nd Floor], [10106], [US-NY]], [[[2711 Centerville Road], [Suite 400]], [Wilmington], [US], [C/O Corporation Service Company], [19808], [US-DE]], [[T91T], [LIMITED PARTNERSHIP]], [US-DE], [Hutchin Hill Capital, LP], [[4386463], [RA000602]]]]|[[[2012-11-29 22:03:00], [2020-06-03 20:03:00], [EVK05KS7XY1DEII3R011], [2021-05-29 13:20:00], [ISSUED], [[S000005113], [RA000665]], [FULLY_CORROBORATED]], [[2012-06-06 21:26:00], [2020-07-17 18:10:00], [EVK05KS7XY1DEII3R011], [2018-05-08 19:16:00], [LAPSED], [[4386463], [RA000602]], [FULLY_CORROBORATED]]]|
+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

完整代码片段-->

运行上述代码后,出现以下错误

pyspark.sql.utils.AnalysisException: cannot resolve 'flatten(`records`)' due to data type mismatch: The argument should be an array of arrays, but '`records`' is of array<struct<Entity:struct<EntityCategory:struct<$:string>,EntityStatus:struct<$:string>,HeadquartersAddress:struct<AdditionalAddressLine:struct<$:string>,City:struct<$:string>,Country:struct<$:string>,FirstAddressLine:struct<$:string>,PostalCode:struct<$:string>,Region:struct<$:string>>,LegalAddress:struct<AdditionalAddressLine:array<struct<$:string>>,City:struct<$:string>,Country:struct<$:string>,FirstAddressLine:struct<$:string>,PostalCode:struct<$:string>,Region:struct<$:string>>,LegalForm:struct<EntityLegalFormCode:struct<$:string>,OtherLegalForm:struct<$:string>>,LegalJurisdiction:struct<$:string>,LegalName:struct<$:string>,RegistrationAuthority:struct<RegistrationAuthorityEntityID:struct<$:string>,RegistrationAuthorityID:struct<$:string>>>,Extension:struct<gleif:Geocoding:string>,LEI:struct<$:string>,Registration:struct<InitialRegistrationDate:struct<$:timestamp>,LastUpdateDate:struct<$:timestamp>,ManagingLOU:struct<$:string>,NextRenewalDate:struct<$:timestamp>,RegistrationStatus:struct<$:string>,ValidationAuthority:struct<ValidationAuthorityEntityID:struct<$:string>,ValidationAuthorityID:struct<$:string>>,ValidationSources:struct<$:string>>>> type.;;                                   'Project [flatten(records#0) AS flatten(records)#2] 
pyspark.sql.utils.AnalysisException: cannot resolve 'flatten(`records`)' due to data type mismatch: The argument should be an array of arrays, but '`records`' is of array<struct<Entity:struct<EntityCategory:struct<$:string>,EntityStatus:struct<$:string>,HeadquartersAddress:struct<AdditionalAddressLine:struct<$:string>,City:struct<$:string>,Country:struct<$:string>,FirstAddressLine:struct<$:string>,PostalCode:struct<$:string>,Region:struct<$:string>>,LegalAddress:struct<AdditionalAddressLine:array<struct<$:string>>,City:struct<$:string>,Country:struct<$:string>,FirstAddressLine:struct<$:string>,PostalCode:struct<$:string>,Region:struct<$:string>>,LegalForm:struct<EntityLegalFormCode:struct<$:string>,OtherLegalForm:struct<$:string>>,LegalJurisdiction:struct<$:string>,LegalName:struct<$:string>,RegistrationAuthority:struct<RegistrationAuthorityEntityID:struct<$:string>,RegistrationAuthorityID:struct<$:string>>>,Extension:struct<gleif:Geocoding:string>,LEI:struct<$:string>,Registration:struct<InitialRegistrationDate:struct<$:timestamp>,LastUpdateDate:struct<$:timestamp>,ManagingLOU:struct<$:string>,NextRenewalDate:struct<$:timestamp>,RegistrationStatus:struct<$:string>,ValidationAuthority:struct<ValidationAuthorityEntityID:struct<$:string>,ValidationAuthorityID:struct<$:string>>,ValidationSources:struct<$:string>>>> type.;;                                   'Project [flatten(records#0) AS flatten(records)#2] 
{
  "records": [
    {
      "LEI": {
        "$": "001GPB6A9XPE8XJICC14"
      },
      "Entity": {
        "LegalName": {
          "$": "FIDELITY ADVISOR SERIES I - Fidelity Advisor Leveraged Company Stock Fund"
        },
        "LegalAddress": {
          "FirstAddressLine": {
            "$": "245 SUMMER STREET"
          },
          "City": {
            "$": "BOSTON"
          },
          "Region": {
            "$": "US-MA"
          },
          "Country": {
            "$": "US"
          },
          "PostalCode": {
            "$": "02110"
          }
        },
        "HeadquartersAddress": {
          "FirstAddressLine": {
            "$": "245 Summer Street"
          },
          "City": {
            "$": "Boston"
          },
          "Region": {
            "$": "US-MA"
          },
          "Country": {
            "$": "US"
          },
          "PostalCode": {
            "$": "02210"
          }
        },
        "RegistrationAuthority": {
          "RegistrationAuthorityID": {
            "$": "RA000665"
          },
          "RegistrationAuthorityEntityID": {
            "$": "S000005113"
          }
        },
        "LegalJurisdiction": {
          "$": "US-MA"
        },
        "EntityCategory": {
          "$": "FUND"
        },
        "LegalForm": {
          "EntityLegalFormCode": {
            "$": "8888"
          },
          "OtherLegalForm": {
            "$": "OTHER"
          }
        },
        "EntityStatus": {
          "$": "ACTIVE"
        }
      },
      "Registration": {
        "InitialRegistrationDate": {
          "$": "2012-11-29T16:33:00.000Z"
        },
        "LastUpdateDate": {
          "$": "2020-06-03T14:33:00.000Z"
        },
        "RegistrationStatus": {
          "$": "ISSUED"
        },
        "NextRenewalDate": {
          "$": "2021-05-29T07:50:00.000Z"
        },
        "ManagingLOU": {
          "$": "EVK05KS7XY1DEII3R011"
        },
        "ValidationSources": {
          "$": "FULLY_CORROBORATED"
        },
        "ValidationAuthority": {
          "ValidationAuthorityID": {
            "$": "RA000665"
          },
          "ValidationAuthorityEntityID": {
            "$": "S000005113"
          }
        }
      },
      "Extension": {
        "gleif:Geocoding": {
          "gleif:original_address": {
            "$": "245 Summer Street, 02210, Boston, US-MA, US"
          },
          "gleif:relevance": {
            "$": "0.92"
          },
          "gleif:match_type": {
            "$": "pointAddress"
          },
          "gleif:lat": {
            "$": "42.3514"
          },
          "gleif:lng": {
            "$": "-71.05385"
          },
          "gleif:geocoding_date": {
            "$": "2017-10-23T19:14:11"
          },
          "gleif:bounding_box": {
            "$": "TopLeft.Latitude: 42.3525242, TopLeft.Longitude: -71.0553711, BottomRight.Latitude: 42.3502758, BottomRight.Longitude: -71.0523289"
          },
          "gleif:match_level": {
            "$": "houseNumber"
          },
          "gleif:formatted_address": {
            "$": "245 Summer St, Boston, MA 02210, United States"
          },
          "gleif:mapped_location_id": {
            "$": "NT_PYMT6GOD3rrAC9q2Al5jZB_yQTN"
          },
          "gleif:mapped_street": {
            "$": "Summer St"
          },
          "gleif:mapped_housenumber": {
            "$": "245"
          },
          "gleif:mapped_postalcode": {
            "$": "02210"
          },
          "gleif:mapped_city": {
            "$": "Boston"
          },
          "gleif:mapped_district": {
            "$": "Downtown Boston"
          },
          "gleif:mapped_state": {
            "$": "MA"
          },
          "gleif:mapped_country": {
            "$": "USA"
          }
        }
      }
    },
    {
      "LEI": {
        "$": "004L5FPTUREIWK9T2N63"
      },
      "Entity": {
        "LegalName": {
          "$": "Hutchin Hill Capital, LP"
        },
        "LegalAddress": {
          "FirstAddressLine": {
            "$": "C/O Corporation Service Company"
          },
          "AdditionalAddressLine": [
            {
              "$": "2711 Centerville Road"
            },
            {
              "$": "Suite 400"
            }
          ],
          "City": {
            "$": "Wilmington"
          },
          "Region": {
            "$": "US-DE"
          },
          "Country": {
            "$": "US"
          },
          "PostalCode": {
            "$": "19808"
          }
        },
        "HeadquartersAddress": {
          "FirstAddressLine": {
            "$": "22nd Floor"
          },
          "AdditionalAddressLine": {
            "$": "888 7th Avenue"
          },
          "City": {
            "$": "New York"
          },
          "Region": {
            "$": "US-NY"
          },
          "Country": {
            "$": "US"
          },
          "PostalCode": {
            "$": "10106"
          }
        },
        "RegistrationAuthority": {
          "RegistrationAuthorityID": {
            "$": "RA000602"
          },
          "RegistrationAuthorityEntityID": {
            "$": "4386463"
          }
        },
        "LegalJurisdiction": {
          "$": "US-DE"
        },
        "LegalForm": {
          "EntityLegalFormCode": {
            "$": "T91T"
          },
          "OtherLegalForm": {
            "$": "LIMITED PARTNERSHIP"
          }
        },
        "EntityStatus": {
          "$": "ACTIVE"
        }
      },
      "Registration": {
        "InitialRegistrationDate": {
          "$": "2012-06-06T15:56:00.000Z"
        },
        "LastUpdateDate": {
          "$": "2020-07-17T12:40:00.000Z"
        },
        "RegistrationStatus": {
          "$": "LAPSED"
        },
        "NextRenewalDate": {
          "$": "2018-05-08T13:46:00.000Z"
        },
        "ManagingLOU": {
          "$": "EVK05KS7XY1DEII3R011"
        },
        "ValidationSources": {
          "$": "FULLY_CORROBORATED"
        },
        "ValidationAuthority": {
          "ValidationAuthorityID": {
            "$": "RA000602"
          },
          "ValidationAuthorityEntityID": {
            "$": "4386463"
          }
        }
      },
      "Extension": {
        "gleif:Geocoding": [
          {
            "gleif:original_address": {
              "$": "22nd Floor, 888 7th Avenue, 10106, New York, US-NY, US"
            },
            "gleif:relevance": {
              "$": "0.94"
            },
            "gleif:match_type": {
              "$": "pointAddress"
            },
            "gleif:lat": {
              "$": "40.76537"
            },
            "gleif:lng": {
              "$": "-73.98088"
            },
            "gleif:geocoding_date": {
              "$": "2017-10-25T06:53:52"
            },
            "gleif:bounding_box": {
              "$": "TopLeft.Latitude: 40.7664942, TopLeft.Longitude: -73.9823642, BottomRight.Latitude: 40.7642458, BottomRight.Longitude: -73.9793958"
            },
            "gleif:match_level": {
              "$": "houseNumber"
            },
            "gleif:formatted_address": {
              "$": "888 7th Ave, New York, NY 10106, United States"
            },
            "gleif:mapped_location_id": {
              "$": "NT_42almrnte4m8ALt9ONHN2C_4gDO"
            },
            "gleif:mapped_street": {
              "$": "7th Ave"
            },
            "gleif:mapped_housenumber": {
              "$": "888"
            },
            "gleif:mapped_postalcode": {
              "$": "10106"
            },
            "gleif:mapped_city": {
              "$": "New York"
            },
            "gleif:mapped_district": {
              "$": "Clinton"
            },
            "gleif:mapped_state": {
              "$": "NY"
            },
            "gleif:mapped_country": {
              "$": "USA"
            }
          },
          {
            "gleif:original_address": {
              "$": "C/O Corporation Service Company, 2711 Centerville Road, Suite 400, null, US, US-DE, 19808, Wilmington"
            },
            "gleif:relevance": {
              "$": "0.93"
            },
            "gleif:match_type": {
              "$": "pointAddress"
            },
            "gleif:lat": {
              "$": "39.75411"
            },
            "gleif:lng": {
              "$": "-75.62652"
            },
            "gleif:geocoding_date": {
              "$": "2016-08-16T03:54:45"
            },
            "gleif:bounding_box": {
              "$": "TopLeft.Latitude: 39.7552342, TopLeft.Longitude: -75.6279822, BottomRight.Latitude: 39.7529858, BottomRight.Longitude: -75.6250578"
            },
            "gleif:match_level": {
              "$": "houseNumber"
            },
            "gleif:formatted_address": {
              "$": "2711 Centerville Rd, Wilmington, DE 19808, United States"
            },
            "gleif:mapped_location_id": {
              "$": "NT_8wi0yH62lxXql.LtXORq-C_ycTMxA"
            },
            "gleif:mapped_street": {
              "$": "Centerville Rd"
            },
            "gleif:mapped_housenumber": {
              "$": "2711"
            },
            "gleif:mapped_postalcode": {
              "$": "19808"
            },
            "gleif:mapped_city": {
              "$": "Wilmington"
            },
            "gleif:mapped_state": {
              "$": "DE"
            },
            "gleif:mapped_country": {
              "$": "USA"
            }
          }
        ]
      }
    }
  
]
}
```[enter image description here][1]


  [1]: https://i.stack.imgur.com/ajOJM.png