在scala中将RDD[Array[(String,String)]类型转换为RDD[(String,String)]

在scala中将RDD[Array[(String,String)]类型转换为RDD[(String,String)],scala,apache-spark,rdd,Scala,Apache Spark,Rdd,我是Scala新手,尝试了多种方法将RDD[Array[(String,String)]类型转换为RDD[(String,String)] 我想要实现的是从Json中选择两个元素(文本和类别)。对于文本中的每个单词,我只想创建一个键/值对,形式为(word1,category),(word2,category) 我的示例如下所示: import org.json4s._ import org.json4s.jackson.JsonMethods._ // Example Json-line: {

我是Scala新手,尝试了多种方法将
RDD[Array[(String,String)]
类型转换为
RDD[(String,String)]

我想要实现的是从Json中选择两个元素(文本和类别)。对于文本中的每个单词,我只想创建一个键/值对,形式为(word1,category),(word2,category)

我的示例如下所示:

import org.json4s._
import org.json4s.jackson.JsonMethods._
// Example Json-line: {"reviewText": "This was a gift!", "category": "Apps"}"
val rdd = sc.textFile(PathToJSONFile)
rdd.map{    
   row =>
   val json_row = parse(row)
   val myCategory = compact(json_row \ "category").toString
   val myText = compact(json_row \ "reviewText").toString.toLowerCase.split("[#&$!]").map(_.trim).filter(_.length > 1)
   myText.map{word => (word, myCategory)}
}
Array(Array((this,"Apps"), (was,"Apps"), (a,"Apps"), (gift,"Apps"))
输出是org.apache.spark.rdd.rdd[Array[(String,String)],如下所示:

import org.json4s._
import org.json4s.jackson.JsonMethods._
// Example Json-line: {"reviewText": "This was a gift!", "category": "Apps"}"
val rdd = sc.textFile(PathToJSONFile)
rdd.map{    
   row =>
   val json_row = parse(row)
   val myCategory = compact(json_row \ "category").toString
   val myText = compact(json_row \ "reviewText").toString.toLowerCase.split("[#&$!]").map(_.trim).filter(_.length > 1)
   myText.map{word => (word, myCategory)}
}
Array(Array((this,"Apps"), (was,"Apps"), (a,"Apps"), (gift,"Apps"))
但是我想要实现的是一个键-值对,其形式为
RDD[(String,String)]
(其中key是一个单词,值是此行中每个单词的相同类别)


我怎样才能做到这一点?非常感谢

来自Psidom的建议解决了这个问题。
rdd.map更改为rdd.flatMap
就是解决方案。

rdd.map
更改为
rdd.flatMap