Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Scala中将列扩展到多行_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

在Scala中将列扩展到多行

在Scala中将列扩展到多行,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有以下数据帧(df2) +----------------+---------+-----+------+-----+ |颜色|型号|年份|类型|计数| +----------------+---------+-----+------+-----| |红、绿、白|三菱| 2006 |轿车| 3| |灰色、银色|马自达| 2010 | SUV | 2| +----------------+---------+-----+------+-----+ 我需要分解“颜色”列,因此它看起来像这样一个扩

我有以下数据帧(df2)

+----------------+---------+-----+------+-----+
|颜色|型号|年份|类型|计数|
+----------------+---------+-----+------+-----|
|红、绿、白|三菱| 2006 |轿车| 3|
|灰色、银色|马自达| 2010 | SUV | 2|
+----------------+---------+-----+------+-----+
我需要分解“颜色”列,因此它看起来像这样一个扩展列:

+----------------+---------+-----+------+
|颜色|型号|年份|类型|
+----------------+---------+-----+------+
|红色|三菱| 2006 |轿车|
|绿色|三菱| 2006 |轿车|
|白色|三菱| 2006 |轿车|
|灰色|马自达| 2010 | SUV|
|银色|马自达| 2010 | SUV|
+----------------+---------+-----+------+
我已经创建了一个数组

val colrs=df2.select("Colours").collect.map(_.getString(0))
并将数组添加到dataframe

val cars=df2.withColumn("c",explode($"colrs")).select("Colours","Model","year","type")
但它不起作用,请提供任何帮助。

您可以在
数据帧中使用和函数,如下所示(df2)

您将有如下输出:

cars.show(false)

+-------+----------+----+-----+
|Colours|Model     |year|type |
+-------+----------+----+-----+
|red    |Mitsubishi|2006|sedan|
|green  |Mitsubishi|2006|sedan|
|white  |Mitsubishi|2006|sedan|
|gray   |Mazda     |2010|SUV  |
|silver |Mazda     |2010|SUV  |
+-------+----------+----+-----+
您可以在
数据帧
(df2)中使用和函数,如下所示

您将有如下输出:

cars.show(false)

+-------+----------+----+-----+
|Colours|Model     |year|type |
+-------+----------+----+-----+
|red    |Mitsubishi|2006|sedan|
|green  |Mitsubishi|2006|sedan|
|white  |Mitsubishi|2006|sedan|
|gray   |Mazda     |2010|SUV  |
|silver |Mazda     |2010|SUV  |
+-------+----------+----+-----+