Apache spark PySpark:将nullType字段强制转换为struct type列下的字符串

Apache spark PySpark:将nullType字段强制转换为struct type列下的字符串,apache-spark,pyspark,aws-glue,Apache Spark,Pyspark,Aws Glue,我有一个dataframe,它具有以下模式。translations-->languages(no,pt,…)列下的translation\u version字段位于null中。我想将所有的translation\u version转换为字符串。我在翻译下有17种语言 root |-- translations: struct (nullable = true) | |-- no: struct (nullable = true) | | |-- Description: st

我有一个dataframe,它具有以下模式。
translations-->languages(no,pt,…)
列下的
translation\u version
字段位于
null
中。我想将所有的
translation\u version
转换为字符串。我在
翻译下有17种语言

root
|-- translations: struct (nullable = true)
|    |-- no: struct (nullable = true)
|    |    |-- Description: string (nullable = true)
|    |    |-- class: string (nullable = true)
|    |    |-- description: string (nullable = true)
|    |    |-- translation_version: null (nullable = true) // Want to cast as string
|    |-- pt: struct (nullable = true)
|    |    |-- Description: string (nullable = true)
|    |    |-- class: string (nullable = true)
|    |    |-- description: string (nullable = true)
|    |    |-- translation_version: null (nullable = true)
|    |-- fr: struct (nullable = true)
|    |    |-- Description: string (nullable = true)
|    |    |-- class: string (nullable = true)
|    |    |-- description: string (nullable = true)
|    |    |-- translation_version: null (nullable = true)
我尝试了
df=df.na.fill('null')
,但没有做任何更改。还尝试使用以下代码强制转换

df = df.withColumn("translations", F.col("translations").cast("struct<struct<translation_version: string>>"))
df=df.withColumn(“translations”,F.col(“translations”).cast(“struct”))
但这返回了以下错误

pyspark.sql.utils.ParseException: u"\nmismatched input '<' expecting ':'(line 1, pos 13)\n\n== SQL ==\nstruct<struct<translation_version: string>>\n-------------^^^\n"

pyspark.sql.utils.ParseException:u“\n匹配输入”这应该可以解决问题

从pyspark.sql.functions导入col,struct
从pyspark.sql.types导入StructType、StructField、StringType
模式=结构类型([StructField(“Description”,StringType(),True),
StructField(“类”,StringType(),True),
StructField(“说明”,StringType(),True),
StructField(“翻译版本”,StringType(),True)
]
)
df_1=(
df
.选择(“翻译”。)
.withColumn(“翻译”),结构(
col(“fr”).cast(schema).alias(“fr”),
col(“pt”).cast(schema).alias(“pt”),
列(“编号”).cast(模式)。别名(“编号”)
)
)
.删除(“fr”、“pt”、“no”)
)