Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 试图定义一种特质,这也是Spark的产品_Scala_Apache Spark_Types_Functional Programming - Fatal编程技术网

Scala 试图定义一种特质,这也是Spark的产品

Scala 试图定义一种特质,这也是Spark的产品,scala,apache-spark,types,functional-programming,Scala,Apache Spark,Types,Functional Programming,我正在为我在使用Spark编程时反复遇到的一些设计模式编写一些库。我试图概括的一个方法是,按某个键对数据集进行分组,然后对每个组进行排序,然后返回原始类型,因此一个简单的示例是: case class Counter(id: String, count: Long) // Let's say I have some Dataset... val counters: Dataset[Counter] // The operation I find myself doing quite ofte

我正在为我在使用Spark编程时反复遇到的一些设计模式编写一些库。我试图概括的一个方法是,按某个键对数据集进行分组,然后对每个组进行排序,然后返回原始类型,因此一个简单的示例是:

case class Counter(id: String, count: Long)

// Let's say I have some Dataset...
val counters: Dataset[Counter]

// The operation I find myself doing quite often:
import sqlContext.implicits._
counters.groupByKey(_.id)
  .reduceGroups((a, b) => Counter(a.id, a.count + b.count))
  .map(_._2)
为了推广这一点,我添加了一种新类型:

trait KeyedData[K <: Product, T <: KeyedData[K, T] with Product] { self T =>
  def key: K
  def merge(other: T): T
}
然后,我创建了以下隐式类来向数据集添加功能:

implicit class KeyedDataDatasetWrapper[K <: Product, T <: KeyedData[K, T] with Product](ds: Dataset[T]) {
  def collapse(implicit sqlContext: SQLContext): Dataset[T] = {
    import sqlContext.implicits._

    ds.groupByKey(_.key).reduceGroups(_.merge(_)).map(_._2)
  }
}
很明显,有些东西没有被识别为
产品
类型,因此我的类型参数肯定在某个地方出了问题,但我不确定是什么原因。有什么想法吗

更新

我将隐式类更改为以下内容:

implicit class KeyedDataDatasetWrapper[K <: Product : TypeTag,
                                       T <: KeyedData[K, T] with Product : TypeTag](ds: Dataset[T]) {
  def merge(implicit sqlContext: SQLContext): Dataset[T] = {
    implicit val encK: Encoder[K] = Encoders.product[K]
    implicit val encT: Encoder[T] = Encoders.product[T]

    ds.groupByKey(_.key).reduceGroups(_.comb(_)).map(_._2)
  }
}
我现在得到这个编译错误,似乎
Dataset[Counter]
与隐式类定义中的
Dataset[T]
不匹配:

: value merge is not a member of org.apache.spark.sql.Dataset[Counter]
[error]     ds.merge
[error]        ^
密切相关
implicit class KeyedDataDatasetWrapper[K <: Product : TypeTag,
                                       T <: KeyedData[K, T] with Product : TypeTag](ds: Dataset[T]) {
  def merge(implicit sqlContext: SQLContext): Dataset[T] = {
    implicit val encK: Encoder[K] = Encoders.product[K]
    implicit val encT: Encoder[T] = Encoders.product[T]

    ds.groupByKey(_.key).reduceGroups(_.comb(_)).map(_._2)
  }
}
val ds: Dataset[Counter] = ...
val merged = ds.merge
: value merge is not a member of org.apache.spark.sql.Dataset[Counter]
[error]     ds.merge
[error]        ^