Scala 如何在元组中设置为(String,(String,Int))的值的键上使用reduceByKey?
我试图循环一个文本文件的RDD,对文件中的每个唯一单词进行计数,然后累积每个唯一单词后面的所有单词及其计数。到目前为止,我的情况如下:Scala 如何在元组中设置为(String,(String,Int))的值的键上使用reduceByKey?,scala,apache-spark,Scala,Apache Spark,我试图循环一个文本文件的RDD,对文件中的每个唯一单词进行计数,然后累积每个唯一单词后面的所有单词及其计数。到目前为止,我的情况如下: // connecting to spark driver val conf = new SparkConf().setAppName("WordStats").setMaster("local") val spark = new SparkContext(conf) //Creates a new SparkContext object //Loads th
// connecting to spark driver
val conf = new SparkConf().setAppName("WordStats").setMaster("local")
val spark = new SparkContext(conf) //Creates a new SparkContext object
//Loads the specified file into an RDD
val lines = sparkContext.textFile(System.getProperty("user.dir") + "/" + "basketball_words_only.txt")
//Splits the file into individual words
val words = lines.flatMap(line => {
val wordList = line.split(" ")
for {i <- 0 until wordList.length - 1}
yield (wordList(i), wordList(i + 1), 1)
})
//连接到spark驱动程序
val conf=new SparkConf().setAppName(“WordStats”).setMaster(“本地”)
val spark=new SparkContext(conf)//创建一个新的SparkContext对象
//将指定的文件加载到RDD中
val lines=sparkContext.textFile(System.getProperty(“user.dir”)+“/”+“basketball\u words\u only.txt”)
//将文件拆分为单个字
val words=lines.flatMap(line=>{
val wordList=line.split(“”)
对于{i如果我理解正确,我们有如下内容:
val lines: Seq[String] = ...
val words: Seq[(String, String, Int)] = ...
val frequencies: Map[String, Seq[(String, Int)]] = {
words
.groupBy(_._1) // word -> [(w, next, cc), ...]
.mapValues { values =>
values
.map { case (w, n, cc) => (n, cc) }
.groupBy(_._1) // next -> [(next, cc), ...]
.mapValues(_.reduce(_._2 + _._2)) // next -> sum
.toSeq
}
}
我们想要这样的东西:
val lines: Seq[String] = ...
val words: Seq[(String, String, Int)] = ...
val frequencies: Map[String, Seq[(String, Int)]] = {
words
.groupBy(_._1) // word -> [(w, next, cc), ...]
.mapValues { values =>
values
.map { case (w, n, cc) => (n, cc) }
.groupBy(_._1) // next -> [(next, cc), ...]
.mapValues(_.reduce(_._2 + _._2)) // next -> sum
.toSeq
}
}
谢谢,@adu!您似乎完全理解我在寻找什么,但是,当我尝试将此代码添加到我自己的代码中时,我在编译时遇到了两个错误,如下所示:错误:(92,36)类型不匹配;发现:Int required:(String,Int).mapValues(.reduce(.\u 2+。\u 2))//下一步->求和
错误:(93,12)类型不匹配;发现:Seq[(String,(String,Int))]必需:Seq[(String,Int)]。将代码放在第85-95行中进行排序