Scala 具有通用返回类型的Spark数据集转换方法赢得';不编译
我有以下case类,它们是从trait扩展而来的Scala 具有通用返回类型的Spark数据集转换方法赢得';不编译,scala,apache-spark,generics,apache-spark-sql,apache-spark-dataset,Scala,Apache Spark,Generics,Apache Spark Sql,Apache Spark Dataset,我有以下case类,它们是从trait扩展而来的 package com.mypackage.spark.event case class TypedEvent(id: String, timestamp: Long, `type`: String) sealed trait Event{ def id: String def timestamp: Long } case class CreationEvent(id: String, timestamp: Long) extend
package com.mypackage.spark.event
case class TypedEvent(id: String, timestamp: Long, `type`: String)
sealed trait Event{
def id: String
def timestamp: Long
}
case class CreationEvent(id: String, timestamp: Long) extends Event
case class DeleteEvent(id: String, timestamp: Long) extends Event
我需要将指定类型的数据集
TypedEvent
转换为从Event trait
扩展而来的另一个数据集类型,使用数据集类中的方法和模式匹配机制,如下所示(我使用的是Spark 2.3.1):
因此,我尝试向transfrom
方法本身添加一个类型参数,如下所示:
.transform[_
import spark.implicits._
val jsonDF = spark.read.json(pathToJsonFile)
val typedEventsDS = jsonDF.select("id", "timestamp", "type").as[TypedEvent]
val eventTypes = Array("CreateEvent", "DeleteEvent" , ...)
eventTypes.foreach(eventType => {
val result = typedEventsDS.filter($"type" <=> eventType)
.transform(featurize(spark, eventType)) // line 61
/**
* ...
*/
})
def featurize (spark: SparkSession, eventType: String): Dataset[TypedEvent] => Dataset[_ <: Event] = dataset => {
import spark.implicits._
eventType match {
case "CreateEvent" => dataset.as[CreationEvent]
case "DeleteEvent" => dataset.as[DeleteEvent]
...
}
}
Error:(61, 12) no type parameters for method transform: (t:
org.apache.spark.sql.Dataset[com.mypackage.spark.event.TypedEvent] =>
org.apache.spark.sql.Dataset[U])org.apache.spark.sql.Dataset[U] exist so
that it can be applied to arguments
org.apache.spark.sql.Dataset[com.mypackage.spark.event.TypedEvent] =>
org.apache.spark.sql.Dataset[_ <: com.mypackage.spark.event.Event])
--- because ---
argument expression's type is not compatible with formal parameter type;
found:
org.apache.spark.sql.Dataset[com.mypackage.spark.event.TypedEvent] =>
org.apache.spark.sql.Dataset[_ <: com.mypackage.spark.event.Event]
required:
org.apache.spark.sql.Dataset[com.mypackage.spark.event.TypedEvent] =>
org.apache.spark.sql.Dataset[?U]
.transform(featurize(spark, eventType))
def featurize[T <: Event](spark: SparkSession, eventType: String): Dataset[TypedEvent] => Dataset[T] =
dataset => { /* ... same ... */}