Scala 如何在元组中设置为(String,(String,Int))的值的键上使用reduceByKey?

Scala 如何在元组中设置为(String,(String,Int))的值的键上使用reduceByKey?,scala,apache-spark,Scala,Apache Spark,我试图循环一个文本文件的RDD,对文件中的每个唯一单词进行计数,然后累积每个唯一单词后面的所有单词及其计数。到目前为止,我的情况如下: // connecting to spark driver val conf = new SparkConf().setAppName("WordStats").setMaster("local") val spark = new SparkContext(conf) //Creates a new SparkContext object //Loads th

我试图循环一个文本文件的RDD,对文件中的每个唯一单词进行计数,然后累积每个唯一单词后面的所有单词及其计数。到目前为止,我的情况如下:

// connecting to spark driver
val conf = new SparkConf().setAppName("WordStats").setMaster("local")
val spark = new SparkContext(conf) //Creates a new SparkContext object

//Loads the specified file into an RDD
val lines = sparkContext.textFile(System.getProperty("user.dir") + "/" + "basketball_words_only.txt")

//Splits the file into individual words
val words = lines.flatMap(line => {

  val wordList = line.split(" ")

  for {i <- 0 until wordList.length - 1}

    yield (wordList(i), wordList(i + 1), 1)

})
//连接到spark驱动程序
val conf=new SparkConf().setAppName(“WordStats”).setMaster(“本地”)
val spark=new SparkContext(conf)//创建一个新的SparkContext对象
//将指定的文件加载到RDD中
val lines=sparkContext.textFile(System.getProperty(“user.dir”)+“/”+“basketball\u words\u only.txt”)
//将文件拆分为单个字
val words=lines.flatMap(line=>{
val wordList=line.split(“”)

对于{i如果我理解正确,我们有如下内容:

val lines: Seq[String] = ...
val words: Seq[(String, String, Int)] = ...
val frequencies: Map[String, Seq[(String, Int)]] = {
  words
    .groupBy(_._1)                        // word -> [(w, next, cc), ...]
    .mapValues { values =>
      values
        .map { case (w, n, cc) => (n, cc) }
        .groupBy(_._1)                    // next -> [(next, cc), ...]
        .mapValues(_.reduce(_._2 + _._2)) // next -> sum
        .toSeq
    }
}
我们想要这样的东西:

val lines: Seq[String] = ...
val words: Seq[(String, String, Int)] = ...
val frequencies: Map[String, Seq[(String, Int)]] = {
  words
    .groupBy(_._1)                        // word -> [(w, next, cc), ...]
    .mapValues { values =>
      values
        .map { case (w, n, cc) => (n, cc) }
        .groupBy(_._1)                    // next -> [(next, cc), ...]
        .mapValues(_.reduce(_._2 + _._2)) // next -> sum
        .toSeq
    }
}

谢谢,@adu!您似乎完全理解我在寻找什么,但是,当我尝试将此代码添加到我自己的代码中时,我在编译时遇到了两个错误,如下所示:错误:(92,36)类型不匹配;发现:Int required:(String,Int).mapValues(.reduce(.\u 2+。\u 2))//下一步->求和
错误:(93,12)类型不匹配;发现:Seq[(String,(String,Int))]必需:Seq[(String,Int)]。将代码放在第85-95行中进行排序