Apache spark 在创建嵌套pyspark数据frime时重命名嵌套字段名

Apache spark 在创建嵌套pyspark数据frime时重命名嵌套字段名,apache-spark,pyspark,Apache Spark,Pyspark,我正在根据上面的查询创建df。以及使用agg生成嵌套df SELECT DISTINCT POA_KEY addressIdentifier, PROV_SPCLTY_CERTFN_STTS_CD spcltyBoardCertificationCode, PROV_SPCLTY_CERTFN_STTS_CD txnmyBoardCertificationCode, SPCLTY_CD_VAL specialtyCode, SPCLTY_CD_VAL_NM specialtyCodeName,

我正在根据上面的查询创建df。以及使用agg生成嵌套df

SELECT DISTINCT 
POA_KEY addressIdentifier,
PROV_SPCLTY_CERTFN_STTS_CD spcltyBoardCertificationCode,
PROV_SPCLTY_CERTFN_STTS_CD txnmyBoardCertificationCode,
SPCLTY_CD_VAL specialtyCode,
SPCLTY_CD_VAL_NM specialtyCodeName,
SPCLTY_CD_VAL_DESC specialtyCodeDesc,
SPCLTY_CTGRY_CD_VAL specialtyCategoryCode,
SPCLTY_CTGRY_CD_VAL_NM specialtyCategoryName,
SPCLTY_CTGRY_CD_VAL_DESC specialtyCategoryDesc,
TXNMY_CD_VAL taxonomyCode,
TXNMY_CD_VAL_NM taxonomyCodeName,
TXNMY_CD_VAL_DESC taxonomyCodeDesc
FROM TEST A
我需要重命名contactListCode->Code,contactListDesc->Desc和contactListNm->Name

预期产量

contact_df_gp= exprt_df.groupby('addressIdentifier').agg(
  
    f.collect_list(
      f.struct('contactListCode','contactListDesc','contactListNm','phoneNumber')
    
  ).alias('contactLis'),
    f.collect_list(
      f.struct('displayUrl','urlName')
  ).alias('webContactList')
)
{"addressIdentifier":1000105107,"contact":[{"Code":"B","Desc":"BUSINESS","Name":"BUSINESS","phoneNumber":"8037735227"},{"Code":"B","Desc":"BUSINESS","Name":"BUSINESS","phoneNumber":"8037735227"}],"contactweb":[{"displayUrl":"FALSE"},{"displayUrl":"FALSE"}]}
{"addressIdentifier":1000000001,"contact":[{"Code":"B","Desc":"BUSINESS","Name":"BUSINESS","phoneNumber":"7045403667"},{"Code":"B","Desc":"BUSINESS","Name":"BUSINESS","phoneNumber":"7045403667"},{"contactListCode":"B","contactListDesc":"BUSINESS","contactListNm":"BUSINESS","phoneNumber":"7045403667"},{"contactListCode":"B","contactListDesc":"BUSINESS","contactListNm":"BUSINESS","phoneNumber":"7045403667"}],"contactweb":[{"displayUrl":"FALSE"},{"displayUrl":"FALSE"},{"displayUrl":"FALSE"},{"displayUrl":"FALSE"}]}
contact_df_gp= exprt_df.groupby('addressIdentifier').agg(
    f.collect_list(
      f.struct(
        f.col('contactListCode').alias('Code'),
        f.col('contactListDesc').alias('Desc'),
        f.col('contactListNm').alias('Name'),
        f.col('phoneNumber')
     )
    ).alias('contactLis'),
    f.collect_list(
      f.struct('displayUrl','urlName')
    ).alias('webContactList')
)