Apache spark 在创建嵌套pyspark数据frime时重命名嵌套字段名
我正在根据上面的查询创建df。以及使用agg生成嵌套dfApache spark 在创建嵌套pyspark数据frime时重命名嵌套字段名,apache-spark,pyspark,Apache Spark,Pyspark,我正在根据上面的查询创建df。以及使用agg生成嵌套df SELECT DISTINCT POA_KEY addressIdentifier, PROV_SPCLTY_CERTFN_STTS_CD spcltyBoardCertificationCode, PROV_SPCLTY_CERTFN_STTS_CD txnmyBoardCertificationCode, SPCLTY_CD_VAL specialtyCode, SPCLTY_CD_VAL_NM specialtyCodeName,
SELECT DISTINCT
POA_KEY addressIdentifier,
PROV_SPCLTY_CERTFN_STTS_CD spcltyBoardCertificationCode,
PROV_SPCLTY_CERTFN_STTS_CD txnmyBoardCertificationCode,
SPCLTY_CD_VAL specialtyCode,
SPCLTY_CD_VAL_NM specialtyCodeName,
SPCLTY_CD_VAL_DESC specialtyCodeDesc,
SPCLTY_CTGRY_CD_VAL specialtyCategoryCode,
SPCLTY_CTGRY_CD_VAL_NM specialtyCategoryName,
SPCLTY_CTGRY_CD_VAL_DESC specialtyCategoryDesc,
TXNMY_CD_VAL taxonomyCode,
TXNMY_CD_VAL_NM taxonomyCodeName,
TXNMY_CD_VAL_DESC taxonomyCodeDesc
FROM TEST A
我需要重命名contactListCode->Code,contactListDesc->Desc和contactListNm->Name
预期产量
contact_df_gp= exprt_df.groupby('addressIdentifier').agg(
f.collect_list(
f.struct('contactListCode','contactListDesc','contactListNm','phoneNumber')
).alias('contactLis'),
f.collect_list(
f.struct('displayUrl','urlName')
).alias('webContactList')
)
{"addressIdentifier":1000105107,"contact":[{"Code":"B","Desc":"BUSINESS","Name":"BUSINESS","phoneNumber":"8037735227"},{"Code":"B","Desc":"BUSINESS","Name":"BUSINESS","phoneNumber":"8037735227"}],"contactweb":[{"displayUrl":"FALSE"},{"displayUrl":"FALSE"}]}
{"addressIdentifier":1000000001,"contact":[{"Code":"B","Desc":"BUSINESS","Name":"BUSINESS","phoneNumber":"7045403667"},{"Code":"B","Desc":"BUSINESS","Name":"BUSINESS","phoneNumber":"7045403667"},{"contactListCode":"B","contactListDesc":"BUSINESS","contactListNm":"BUSINESS","phoneNumber":"7045403667"},{"contactListCode":"B","contactListDesc":"BUSINESS","contactListNm":"BUSINESS","phoneNumber":"7045403667"}],"contactweb":[{"displayUrl":"FALSE"},{"displayUrl":"FALSE"},{"displayUrl":"FALSE"},{"displayUrl":"FALSE"}]}
contact_df_gp= exprt_df.groupby('addressIdentifier').agg(
f.collect_list(
f.struct(
f.col('contactListCode').alias('Code'),
f.col('contactListDesc').alias('Desc'),
f.col('contactListNm').alias('Name'),
f.col('phoneNumber')
)
).alias('contactLis'),
f.collect_list(
f.struct('displayUrl','urlName')
).alias('webContactList')
)