在scala中,当插入到仅强制某些特征的泛型函数中时,如何强制编码器对类型进行操作?
我有一个名为createTimeLineDS的函数,它接受另一个函数作为输入,并将该函数放置在内部数据集映射方法中。createTimeLineDS仅在输入函数类型签名上强制traits,而Map要求函数返回trait Encoder的某些内容 出于某种原因,当我将一个返回case类的函数放入该函数时,它会抛出一个错误:在scala中,当插入到仅强制某些特征的泛型函数中时,如何强制编码器对类型进行操作?,scala,apache-spark,traits,extend,case-class,Scala,Apache Spark,Traits,Extend,Case Class,我有一个名为createTimeLineDS的函数,它接受另一个函数作为输入,并将该函数放置在内部数据集映射方法中。createTimeLineDS仅在输入函数类型签名上强制traits,而Map要求函数返回trait Encoder的某些内容 出于某种原因,当我将一个返回case类的函数放入该函数时,它会抛出一个错误: Unable to find encoder for type TIMELINE. An implicit Encoder[TIMELINE] is needed to
Unable to find encoder for type TIMELINE. An implicit Encoder[TIMELINE] is needed to store TIMELINE instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
[error] .map({ case ((event, team), user) =>
convertEventToTimeLineFunction(event, team, user)})
代码如下,我已经定义了所有的traits和case类。出错的是最后一个函数,调用该函数会产生上述错误。我有import sparkSession.implicits.uu,因此我不确定如何正确执行此操作
traits、case类和用作参数的函数:
trait Event {
val teamId: String
val actorId: String
}
trait TimeLine {
val teamDomain: Option[String]
val teamName: Option[String]
val teamIsTest: Option[Boolean]
val actorEmail: Option[String]
val actorName: Option[String]
}
case class JobEventTimeline(
jobId: String,
jobType: Option[String],
inPlanning: Option[Boolean],
teamId: String,
actorId: String,
adminActorId: Option[String],
sessionId: String,
clientSessionId: Option[String],
clientCreatedAt: Long,
seqId: Long,
isSideEffect: Option[Boolean],
opAction: String,
stepId: Option[String],
jobBaseStepId: Option[String],
fieldId: Option[String],
serverReceivedAt: Option[Long],
// "Enriched" data. Data is pulled in from other sources during stream processing
teamDomain: Option[String] = None,
teamName: Option[String] = None,
teamIsTest: Option[Boolean] = None,
actorEmail: Option[String] = None,
actorName: Option[String] = None
) extends TimeLine
def createJobEventTimeLine(jobEvent: CaseClassJobEvent, team: Team, user: User): JobEventTimeline = {
JobEventTimeline(
jobEvent.jobId,
jobEvent.jobType,
jobEvent.inPlanning,
jobEvent.teamId,
jobEvent.actorId,
jobEvent.adminActorId,
jobEvent.sessionId,
jobEvent.clientSessionId,
jobEvent.clientCreatedAt,
jobEvent.seqId,
jobEvent.isSideEffect,
jobEvent.opAction,
jobEvent.stepId,
jobEvent.jobBaseStepId,
jobEvent.fieldId,
jobEvent.serverReceivedAt,
Some(team.domain),
Some(team.name),
Some(team.is_test),
Some(user.email),
Some(user.name)
)
}
问题函数和函数调用:
def createTimeLineDS[EVENT <: Event with Serializable, TIMELINE <: TimeLine]
(convertEventToTimeLineFunction: (EVENT, Team, User) => TIMELINE)
(sparkSession: SparkSession)
(jobEventDS: Dataset[EVENT]): Dataset[TIMELINE] = {
import sparkSession.implicits._
val teamDS = FuncUtils.createDSFromPostgresql[Team](sparkSession)
val userDS = FuncUtils.createDSFromPostgresql[User](sparkSession)
jobEventDS
.joinWith(teamDS, jobEventDS("teamId") === teamDS("id"), "left_outer")
.joinWith(userDS, $"_1.actorId" === userDS("id"), "left_outer")
.map({ case ((event, team), user) => convertEventToTimeLineFunction(event, team, user)})
val jobEventTimeLine = FuncUtils.createTimeLineDS(JobEventTimeline.createJobEventTimeLine)(sparkSession)(jobEventDS)
最简单的解决方案是这样做:
def createTimeLineDS[EVENT <: Event, TIMELINE <: TimeLine : Encoder](...)
@路易斯米盖尔梅杰亚斯亚雷斯你是斯卡拉天才。不过,它使用sparkSession作为参数。我的问题是为什么这样做?我添加了一个更详细的答案,并提供了几个链接和示例。希望有帮助!
object Team {
// https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$@product[T%3C:Product](implicitevidence$5:reflect.runtime.universe.TypeTag[T]):org.apache.spark.sql.Encoder[T]
implicit final val TeamEncoder: Encoder[Team] = Encoders.product
}