Scala 如何在spark map函数中输出多个(键、值)

Scala 如何在spark map函数中输出多个(键、值),scala,apache-spark,spark-dataframe,Scala,Apache Spark,Spark Dataframe,输入数据的格式如下所示: +--------------------+-------------+--------------------+ | StudentID| Right | Wrong | +--------------------+-------------+--------------------+ | studentNo01 | a,b,c | x,y,z | +----

输入数据的格式如下所示:

+--------------------+-------------+--------------------+
|           StudentID|       Right |             Wrong  |
+--------------------+-------------+--------------------+
|       studentNo01  |       a,b,c |            x,y,z   |
+--------------------+-------------+--------------------+
|       studentNo02  |         c,d |              v,w   |
+--------------------+-------------+--------------------+
+--------------------+---------+
|           key      |    value|
+--------------------+---------+
|     studentNo01,a  |       1 |
+--------------------+---------+
|     studentNo01,b  |       1 |
+--------------------+---------+
|     studentNo01,c  |       1 | 
+--------------------+---------+
|     studentNo01,x  |       0 | 
+--------------------+---------+
|     studentNo01,y  |       0 | 
+--------------------+---------+
|     studentNo01,z  |       0 | 
+--------------------+---------+
|     studentNo02,c  |       1 | 
+--------------------+---------+
|     studentNo02,d  |       1 | 
+--------------------+---------+
|     studentNo02,v  |       0 | 
+--------------------+---------+
|     studentNo02,w  |       0 | 
+--------------------+---------+
输出的格式如下:

+--------------------+-------------+--------------------+
|           StudentID|       Right |             Wrong  |
+--------------------+-------------+--------------------+
|       studentNo01  |       a,b,c |            x,y,z   |
+--------------------+-------------+--------------------+
|       studentNo02  |         c,d |              v,w   |
+--------------------+-------------+--------------------+
+--------------------+---------+
|           key      |    value|
+--------------------+---------+
|     studentNo01,a  |       1 |
+--------------------+---------+
|     studentNo01,b  |       1 |
+--------------------+---------+
|     studentNo01,c  |       1 | 
+--------------------+---------+
|     studentNo01,x  |       0 | 
+--------------------+---------+
|     studentNo01,y  |       0 | 
+--------------------+---------+
|     studentNo01,z  |       0 | 
+--------------------+---------+
|     studentNo02,c  |       1 | 
+--------------------+---------+
|     studentNo02,d  |       1 | 
+--------------------+---------+
|     studentNo02,v  |       0 | 
+--------------------+---------+
|     studentNo02,w  |       0 | 
+--------------------+---------+
正确的意思是1,错误的意思是0


我想使用Spark map函数或udf处理这些数据,但我不知道如何处理它。你能帮帮我吗?谢谢。

使用拆分和分解两次并进行联合

val df = List(
  ("studentNo01","a,b,c","x,y,z"),
  ("studentNo02","c,d","v,w")
  ).toDF("StudenID","Right","Wrong")

+-----------+-----+-----+
|   StudenID|Right|Wrong|
+-----------+-----+-----+
|studentNo01|a,b,c|x,y,z|
|studentNo02|  c,d|  v,w|
+-----------+-----+-----+


val pair = (
  df.select('StudenID,explode(split('Right,",")))
    .select(concat_ws(",",'StudenID,'col).as("key"))
    .withColumn("value",lit(1))
).unionAll(
  df.select('StudenID,explode(split('Wrong,",")))
    .select(concat_ws(",",'StudenID,'col).as("key"))
    .withColumn("value",lit(0))
)


+-------------+-----+
|          key|value|
+-------------+-----+
|studentNo01,a|    1|
|studentNo01,b|    1|
|studentNo01,c|    1|
|studentNo02,c|    1|
|studentNo02,d|    1|
|studentNo01,x|    0|
|studentNo01,y|    0|
|studentNo01,z|    0|
|studentNo02,v|    0|
|studentNo02,w|    0|
+-------------+-----+
您可以按如下方式转换为RDD

val rdd = pair.map(r => (r.getString(0),r.getInt(1)))

您想要数据帧输入和RDD输出吗?