在pyspark中将结构数组分解为列

在pyspark中将结构数组分解为列,pyspark,explode,Pyspark,Explode,我想将结构数组分解为列(由结构字段定义)。例如 应该转化为 |-- name: string (nullable = true) |-- sbox_ctr: double (nullable = true) |-- wise_ctr: double (nullable = true) 我该怎么做 def get_final_dataframe(pathname, df): cur_names = pathname.split(".") if len(cur_names) > 1:

我想将结构数组分解为列(由结构字段定义)。例如

应该转化为

|-- name: string (nullable = true)
|-- sbox_ctr: double (nullable = true)
|-- wise_ctr: double (nullable = true)
我该怎么做

def get_final_dataframe(pathname, df):
cur_names = pathname.split(".")
if len(cur_names) > 1:
    root_name = cur_names[0]
    delimiter = "."
    new_path_name = delimiter.join(cur_names[1:len(cur_names)])

    for field in df.schema.fields:
        if field.name == root_name:
            if type(field.dataType) == ArrayType:
                return get_final_dataframe(pathname, df.select(explode(root_name).alias(root_name)))
            elif type(field.dataType) == StructType:
                if hasColumn(df, delimiter.join(cur_names[0:2])):
                    return get_final_dataframe(new_path_name, df.select(delimiter.join(cur_names[0:2])))
                else:
                    return -1, -1
            else:
                return -1, -1

else:
    root_name = cur_names[0]
    for field in df.schema.fields:
        if field.name == root_name:
            if type(field.dataType) == StringType:
                return df, "string"
            elif type(field.dataType) == LongType:
                return df, "numeric"
            elif type(field.dataType) == DoubleType:
                return df, "numeric"
            else:
                return df, -1

return -1, -1
那么,你可以

key = "a.b.c.name"
# key = "context.content_feature.tag.name"
df2, field_type = get_final_dataframe(key, df1)

你是一个新的贡献者,欢迎你,但这不是你问问题的方式。你应该在你的代码上下功夫,把你到目前为止所做的事情放在这里。阅读帮助中心主题,了解更多有关您可以提出的问题以及您必须避免的问题类型的信息。社区一定会很乐意为您解答正确的问题。
key = "a.b.c.name"
# key = "context.content_feature.tag.name"
df2, field_type = get_final_dataframe(key, df1)