Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将dataframe的架构更改为其他架构_Dataframe_Apache Spark_Pyspark_Apache Spark Sql_Pyspark Dataframes - Fatal编程技术网

将dataframe的架构更改为其他架构

将dataframe的架构更改为其他架构,dataframe,apache-spark,pyspark,apache-spark-sql,pyspark-dataframes,Dataframe,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Dataframes,我有一个看起来像这样的数据框 df.printSchema() root |-- id: integer (nullable = true) |-- data: struct (nullable = true) | |-- foo01 string (nullable = true) | |-- bar01 string (nullable = true) | |-- foo02 string (nullable = true) | |-- bar02 st

我有一个看起来像这样的数据框

df.printSchema()

root
 |-- id: integer (nullable = true)
 |-- data: struct (nullable = true)
 |    |-- foo01 string (nullable = true)
 |    |-- bar01 string (nullable = true)
 |    |-- foo02 string (nullable = true)
 |    |-- bar02 string (nullable = true)
我想把它转换成

root
 |-- id: integer (nullable = true)
 |-- foo: struct (nullable = true)
 |    |-- foo01 string (nullable = true)
 |    |-- foo02 string (nullable = true)
 |-- bar: struct (nullable = true)
 |    |-- bar01 string (nullable = true)
 |    |-- bar02 string (nullable = true)
最好的方法是什么?

您只需使用Pyspark函数即可

从pyspark.sql.functions导入结构
新建_df=df.select(
“id”,
结构('data.foo01','data.foo02')。别名('foo'),
结构('data.bar01','data.bar02')。别名('bar'),
)

与struct Pyspark函数相关的其他注意事项:它可以使用字符串列名列表来仅将列移动到结构中,或者如果您需要表达式列表。

您可以将struct函数与select一起使用,如下所示:

from pyspark.sql import functions as F

finalDF = df.select( "id",
                     F.struct("data.foo01", "data.foo02").alias("foo"),
                     F.struct("data.bar01", "data.bar02").alias("bar")
                     )


finalDF.printSchema
模式:

root
 |-- id: string (nullable = true)
 |-- foo: struct (nullable = false)
 |    |-- foo01: string (nullable = true)
 |    |-- foo02: string (nullable = true)
 |-- bar: struct (nullable = false)
 |    |-- bar01: string (nullable = true)
 |    |-- bar02: string (nullable = true)

谢谢我会接受这个答案。还有一个问题,我在最初的问题中没有具体说明:如果我想将
foo01
重命名为
foo99
,我该怎么办?您可以使用
struct(F.col('data.foo01')。alias('new_name')、F.col('data.foo02')。alias('other_new_name'))