Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/angular/31.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 在spark中使用KeyValueGroupedDataset cogroup_Scala_Apache Spark - Fatal编程技术网

Scala 在spark中使用KeyValueGroupedDataset cogroup

Scala 在spark中使用KeyValueGroupedDataset cogroup,scala,apache-spark,Scala,Apache Spark,我想在spark中的KeyValueGroupedDataset上使用cogroup方法。以下是scala尝试,但出现错误: import org.apache.spark.sql.functions._ val x1 = Seq(("a", 36), ("b", 33), ("c", 40), ("a", 38), ("c", 39)).toDS val g1 = x1.groupByKey(_._1) val x2 = Seq(("a", "ali"), ("b", "bob"), ("c"

我想在spark中的KeyValueGroupedDataset上使用cogroup方法。以下是scala尝试,但出现错误:

import org.apache.spark.sql.functions._
val x1 = Seq(("a", 36), ("b", 33), ("c", 40), ("a", 38), ("c", 39)).toDS
val g1 = x1.groupByKey(_._1)
val x2 = Seq(("a", "ali"), ("b", "bob"), ("c", "celine"), ("a", "amin"), ("c", "cecile")).toDS
val g2 = x2.groupByKey(_._1)
val cog = g1.cogroup(g2, (k: Long, iter1:Iterator[(String, Int)], iter2:Iterator[(String, String)]) =>  iter1);
但是得到一个错误:

<console>:34: error: overloaded method value cogroup with alternatives:
  [U, R](other: org.apache.spark.sql.KeyValueGroupedDataset[String,U], f: org.apache.spark.api.java.function.CoGroupFunction[String,(String, Int),U,R], encoder: org.apache.spark.sql.Encoder[R])org.apache.spark.sql.Dataset[R] <and>
  [U, R](other: org.apache.spark.sql.KeyValueGroupedDataset[String,U])(f: (String, Iterator[(String, Int)], Iterator[U]) => TraversableOnce[R])(implicit evidence$11: org.apache.spark.sql.Encoder[R])org.apache.spark.sql.Dataset[R]
 cannot be applied to (org.apache.spark.sql.KeyValueGroupedDataset[String,(String, String)], (Long, Iterator[(String, Int)], Iterator[(String, String)]) => Iterator[(String, Int)])
       val cog = g1.cogroup(g2, (k: Long, iter1:Iterator[(String, Int)], iter2:Iterator[(String, String)]) =>  iter1);
:34:错误:重载了方法值cogroup和替代项:
[U,R](其他:org.apache.spark.sql.KeyValueGroupedDataset[String,U],f:org.apache.spark.api.java.function.CoGroupFunction[String,(String,Int),U,R],编码器:org.apache.spark.sql.encoder[R])org.apache.spark.sql.Dataset[R]
[U,R](其他:org.apache.spark.sql.KeyValueGroupedDataset[String,U])(f:(String,Iterator[(String,Int)],Iterator[U])=>TraversableOnce[R])(隐式证据$11:org.apache.spark.sql.Encoder[R])org.apache spark.sql.Dataset[R]
无法应用于(org.apache.spark.sql.KeyValueGroupedDataset[String,(String,String)],(Long,Iterator[(String,Int)],Iterator[(String,String)])=>Iterator[(String,Int)])
val cog=g1.cogroup(g2,(k:Long,iter1:Iterator[(String,Int)],iter2:Iterator[(String,String)])=>iter1);

我在JAVA中遇到了相同的错误。

您尝试使用的cogroup是curry,因此您必须为数据集和函数分别调用它。键类型中也存在类型不匹配:

g1.cogroup(g2)(
  (k: String, it1: Iterator[(String, Int)], it2: Iterator[(String, String)]) => 
    it1)
或者只是:

g1.cogroup(g2)((_, it1, _) => it1)
在Java中,我会使用
CoGroupFunction
变量:

import org.apache.spark.api.java.function.CoGroupFunction;
import org.apache.spark.sql.Encoders;

g1.cogroup(
  g2,
  (CoGroupFunction<String, Tuple2<String, Integer>, Tuple2<String, String>, Tuple2<String, Integer>>) (key, it1, it2) -> it1,
  Encoders.tuple(Encoders.STRING(), Encoders.INT()));
import org.apache.spark.api.java.function.CoGroupFunction;
导入org.apache.spark.sql.Encoders;
g1.cogroup(
g2,
(CoGroupFunction)(键,it1,it2)->it1,
元组(Encoders.STRING(),Encoders.INT());
其中
g1
g2
keyvaluegroupeddata集