Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 值reduceByKey在导入后不是org.apache.spark.rdd.rdd[(Int,Int)]的成员_Scala_Apache Spark - Fatal编程技术网

Scala 值reduceByKey在导入后不是org.apache.spark.rdd.rdd[(Int,Int)]的成员

Scala 值reduceByKey在导入后不是org.apache.spark.rdd.rdd[(Int,Int)]的成员,scala,apache-spark,Scala,Apache Spark,我创建了这个RDD: scala> val data=sc.textFile("sparkdata.txt") 然后我尝试返回文件的内容: scala> data.collect 我使用以下方法将现有数据划分为单个单词: scala> val splitdata = data.flatMap(line => line.split(" ")); scala> splitdata.persist() scala> spl

我创建了这个RDD:

scala> val data=sc.textFile("sparkdata.txt")
然后我尝试返回文件的内容:

scala> data.collect
我使用以下方法将现有数据划分为单个单词:

scala> val splitdata = data.flatMap(line => line.split(" "));
scala> splitdata.persist()
scala> splitdata.collect;
现在,我正在执行map reduce操作:

scala> val mapdata = splitdata.map(word => (word,1));
scala> mapdata.collect;
scala> val reducedata = mapdata.reduceByKey(_+_);
要获得结果,请执行以下操作:

scala> reducedata.collect;
当我要显示前10行时:

splitdata.groupByKey(identity).count().show(10)
我得到以下错误:

<console>:38: error: value groupByKey is not a member of org.apache.spark.rdd.RDD[String]
       splitdata.groupByKey(identity).count().show(10)
                 ^
<console>:38: error: missing argument list for method identity in object Predef
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `identity _` or `identity(_)` instead of `identity`.
       splitdata.groupByKey(identity).count().show(10)
                            ^
:38:错误:value groupByKey不是org.apache.spark.rdd.rdd[String]的成员
splitdata.groupByKey(identity.count().show(10)
^
:38:错误:对象Predef中缺少方法标识的参数列表
未应用的方法仅在需要函数类型时才会转换为函数。
您可以通过编写`identity`或`identity(`identity)`而不是`identity`来显式转换。
splitdata.groupByKey(identity.count().show(10)
^

类似于
reduceByKey()
,是一种用于
RDD[K,V]
类型的
pairdd
s的方法,而不是用于一般的
RDD
s。
reduceByKey()
使用提供的二进制函数将
RDD[K,V]
还原为另一个
RDD[K,V]
groupByKey()
RDD[K,V]
转换为
RDD[(K,Iterable[V])
。要通过键进一步转换
Iterable[V]
,通常会使用提供的函数应用(或
flatMapValues

例如:

val rdd = sc.parallelize(Seq(
  "apple", "apple", "orange", "banana", "banana", "orange", "apple", "apple", "orange"
))

rdd.map((_, 1)).reduceByKey(_ + _).collect
// res1: Array[(String, Int)] = Array((apple,4), (banana,2), (orange,3))

rdd.map((_, 1)).groupByKey().mapValues(_.sum).take(2)
// res2: Array[(String, Int)] = Array((apple,4), (banana,2))
如果您只想在应用
groupByKey()
后获取组数:

rdd.map((_, 1)).groupByKey().count()
// res3: Long = 3