Scala 试图定义一种特质，这也是Spark的产品_Scala_Apache Spark_Types_Functional Programming

Scala 试图定义一种特质，这也是Spark的产品

scala apache-spark types functional-programming

Scala 试图定义一种特质，这也是Spark的产品,scala,apache-spark,types,functional-programming,Scala,Apache Spark,Types,Functional Programming,我正在为我在使用Spark编程时反复遇到的一些设计模式编写一些库。我试图概括的一个方法是，按某个键对数据集进行分组，然后对每个组进行排序，然后返回原始类型，因此一个简单的示例是： case class Counter(id: String, count: Long) // Let's say I have some Dataset... val counters: Dataset[Counter] // The operation I find myself doing quite ofte

我正在为我在使用Spark编程时反复遇到的一些设计模式编写一些库。我试图概括的一个方法是，按某个键对数据集进行分组，然后对每个组进行排序，然后返回原始类型，因此一个简单的示例是：

case class Counter(id: String, count: Long)

// Let's say I have some Dataset...
val counters: Dataset[Counter]

// The operation I find myself doing quite often:
import sqlContext.implicits._
counters.groupByKey(_.id)
  .reduceGroups((a, b) => Counter(a.id, a.count + b.count))
  .map(_._2)

为了推广这一点，我添加了一种新类型：

trait KeyedData[K <: Product, T <: KeyedData[K, T] with Product] { self T =>
  def key: K
  def merge(other: T): T
}

然后，我创建了以下隐式类来向数据集添加功能：

implicit class KeyedDataDatasetWrapper[K <: Product, T <: KeyedData[K, T] with Product](ds: Dataset[T]) {
  def collapse(implicit sqlContext: SQLContext): Dataset[T] = {
    import sqlContext.implicits._

    ds.groupByKey(_.key).reduceGroups(_.merge(_)).map(_._2)
  }
}

很明显，有些东西没有被识别为

产品

类型，因此我的类型参数肯定在某个地方出了问题，但我不确定是什么原因。有什么想法吗

更新

我将隐式类更改为以下内容：

implicit class KeyedDataDatasetWrapper[K <: Product : TypeTag,
                                       T <: KeyedData[K, T] with Product : TypeTag](ds: Dataset[T]) {
  def merge(implicit sqlContext: SQLContext): Dataset[T] = {
    implicit val encK: Encoder[K] = Encoders.product[K]
    implicit val encT: Encoder[T] = Encoders.product[T]

    ds.groupByKey(_.key).reduceGroups(_.comb(_)).map(_._2)
  }
}

我现在得到这个编译错误，似乎

Dataset[Counter]

与隐式类定义中的

Dataset[T]

不匹配：

: value merge is not a member of org.apache.spark.sql.Dataset[Counter]
[error]     ds.merge
[error]        ^

密切相关

implicit class KeyedDataDatasetWrapper[K <: Product : TypeTag,
                                       T <: KeyedData[K, T] with Product : TypeTag](ds: Dataset[T]) {
  def merge(implicit sqlContext: SQLContext): Dataset[T] = {
    implicit val encK: Encoder[K] = Encoders.product[K]
    implicit val encT: Encoder[T] = Encoders.product[T]

    ds.groupByKey(_.key).reduceGroups(_.comb(_)).map(_._2)
  }
}

val ds: Dataset[Counter] = ...
val merged = ds.merge

: value merge is not a member of org.apache.spark.sql.Dataset[Counter]
[error]     ds.merge
[error]        ^