在Scala中将列扩展到多行
我有以下数据帧(df2)在Scala中将列扩展到多行,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有以下数据帧(df2) +----------------+---------+-----+------+-----+ |颜色|型号|年份|类型|计数| +----------------+---------+-----+------+-----| |红、绿、白|三菱| 2006 |轿车| 3| |灰色、银色|马自达| 2010 | SUV | 2| +----------------+---------+-----+------+-----+ 我需要分解“颜色”列,因此它看起来像这样一个扩
+----------------+---------+-----+------+-----+
|颜色|型号|年份|类型|计数|
+----------------+---------+-----+------+-----|
|红、绿、白|三菱| 2006 |轿车| 3|
|灰色、银色|马自达| 2010 | SUV | 2|
+----------------+---------+-----+------+-----+
我需要分解“颜色”列,因此它看起来像这样一个扩展列:
+----------------+---------+-----+------+
|颜色|型号|年份|类型|
+----------------+---------+-----+------+
|红色|三菱| 2006 |轿车|
|绿色|三菱| 2006 |轿车|
|白色|三菱| 2006 |轿车|
|灰色|马自达| 2010 | SUV|
|银色|马自达| 2010 | SUV|
+----------------+---------+-----+------+
我已经创建了一个数组
val colrs=df2.select("Colours").collect.map(_.getString(0))
并将数组添加到dataframe
val cars=df2.withColumn("c",explode($"colrs")).select("Colours","Model","year","type")
但它不起作用,请提供任何帮助。您可以在数据帧中使用和函数,如下所示(df2)
您将有如下输出:
cars.show(false)
+-------+----------+----+-----+
|Colours|Model |year|type |
+-------+----------+----+-----+
|red |Mitsubishi|2006|sedan|
|green |Mitsubishi|2006|sedan|
|white |Mitsubishi|2006|sedan|
|gray |Mazda |2010|SUV |
|silver |Mazda |2010|SUV |
+-------+----------+----+-----+
您可以在数据帧
(df2)中使用和函数,如下所示
您将有如下输出:
cars.show(false)
+-------+----------+----+-----+
|Colours|Model |year|type |
+-------+----------+----+-----+
|red |Mitsubishi|2006|sedan|
|green |Mitsubishi|2006|sedan|
|white |Mitsubishi|2006|sedan|
|gray |Mazda |2010|SUV |
|silver |Mazda |2010|SUV |
+-------+----------+----+-----+