Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/visual-studio-code/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在scala中,当插入到仅强制某些特征的泛型函数中时,如何强制编码器对类型进行操作?_Scala_Apache Spark_Traits_Extend_Case Class - Fatal编程技术网

在scala中,当插入到仅强制某些特征的泛型函数中时,如何强制编码器对类型进行操作?

在scala中,当插入到仅强制某些特征的泛型函数中时,如何强制编码器对类型进行操作?,scala,apache-spark,traits,extend,case-class,Scala,Apache Spark,Traits,Extend,Case Class,我有一个名为createTimeLineDS的函数,它接受另一个函数作为输入,并将该函数放置在内部数据集映射方法中。createTimeLineDS仅在输入函数类型签名上强制traits,而Map要求函数返回trait Encoder的某些内容 出于某种原因,当我将一个返回case类的函数放入该函数时,它会抛出一个错误: Unable to find encoder for type TIMELINE. An implicit Encoder[TIMELINE] is needed to

我有一个名为createTimeLineDS的函数,它接受另一个函数作为输入,并将该函数放置在内部数据集映射方法中。createTimeLineDS仅在输入函数类型签名上强制traits,而Map要求函数返回trait Encoder的某些内容

出于某种原因,当我将一个返回case类的函数放入该函数时,它会抛出一个错误:

    Unable to find encoder for type TIMELINE. An implicit Encoder[TIMELINE] is needed to store TIMELINE instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
    [error]       .map({ case ((event, team), user) =>  

convertEventToTimeLineFunction(event, team, user)})
代码如下,我已经定义了所有的traits和case类。出错的是最后一个函数,调用该函数会产生上述错误。我有import sparkSession.implicits.uu,因此我不确定如何正确执行此操作

traits、case类和用作参数的函数:

trait Event {
  val teamId: String
  val actorId: String
}

trait TimeLine {
  val teamDomain: Option[String]
  val teamName: Option[String]
  val teamIsTest: Option[Boolean]
  val actorEmail: Option[String]
  val actorName: Option[String]
}  

case class JobEventTimeline(
                         jobId: String,
                         jobType: Option[String],
                         inPlanning: Option[Boolean],

                         teamId: String,
                         actorId: String,
                         adminActorId: Option[String],
                         sessionId: String,
                         clientSessionId: Option[String],
                         clientCreatedAt: Long,
                         seqId: Long,
                         isSideEffect: Option[Boolean],

                         opAction: String,
                         stepId: Option[String],
                         jobBaseStepId: Option[String],
                         fieldId: Option[String],

                         serverReceivedAt: Option[Long],

                         // "Enriched" data. Data is pulled in from other sources during stream processing
                         teamDomain: Option[String] = None,
                         teamName: Option[String] = None,
                         teamIsTest: Option[Boolean] = None,

                         actorEmail: Option[String] = None,
                         actorName: Option[String] = None
                       ) extends TimeLine


def createJobEventTimeLine(jobEvent: CaseClassJobEvent, team: Team, user: User): JobEventTimeline = {
    JobEventTimeline(
      jobEvent.jobId,
      jobEvent.jobType,
      jobEvent.inPlanning,
      jobEvent.teamId,
      jobEvent.actorId,
      jobEvent.adminActorId,
      jobEvent.sessionId,
      jobEvent.clientSessionId,
      jobEvent.clientCreatedAt,
      jobEvent.seqId,
      jobEvent.isSideEffect,
      jobEvent.opAction,
      jobEvent.stepId,
      jobEvent.jobBaseStepId,
      jobEvent.fieldId,
      jobEvent.serverReceivedAt,
      Some(team.domain),
      Some(team.name),
      Some(team.is_test),
      Some(user.email),
      Some(user.name)
    )
  }
问题函数和函数调用:

def createTimeLineDS[EVENT <: Event with Serializable, TIMELINE <: TimeLine]

  (convertEventToTimeLineFunction: (EVENT, Team, User) => TIMELINE)
  (sparkSession: SparkSession)
  (jobEventDS: Dataset[EVENT]): Dataset[TIMELINE] = {
    import sparkSession.implicits._
    val teamDS = FuncUtils.createDSFromPostgresql[Team](sparkSession)
    val userDS = FuncUtils.createDSFromPostgresql[User](sparkSession)
    jobEventDS
      .joinWith(teamDS, jobEventDS("teamId") === teamDS("id"), "left_outer")
      .joinWith(userDS, $"_1.actorId" === userDS("id"), "left_outer")
      .map({ case ((event, team), user) =>  convertEventToTimeLineFunction(event, team, user)})
val jobEventTimeLine = FuncUtils.createTimeLineDS(JobEventTimeline.createJobEventTimeLine)(sparkSession)(jobEventDS)

最简单的解决方案是这样做:

def createTimeLineDS[EVENT <: Event, TIMELINE <: TimeLine : Encoder](...)

@路易斯米盖尔梅杰亚斯亚雷斯你是斯卡拉天才。不过,它使用sparkSession作为参数。我的问题是为什么这样做?我添加了一个更详细的答案,并提供了几个链接和示例。希望有帮助!
object Team {
  // https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$@product[T%3C:Product](implicitevidence$5:reflect.runtime.universe.TypeTag[T]):org.apache.spark.sql.Encoder[T]
  implicit final val TeamEncoder: Encoder[Team] = Encoders.product
}