Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/typo3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark SQL通用数据集读取_Scala_Apache Spark_Generics_Apache Spark Sql_Traits - Fatal编程技术网

Scala Spark SQL通用数据集读取

Scala Spark SQL通用数据集读取,scala,apache-spark,generics,apache-spark-sql,traits,Scala,Apache Spark,Generics,Apache Spark Sql,Traits,我想为加载数据集创建一个通用特性: case class Foo(name: String) trait Loader[T] { def load(implicit spark: SparkSession): Dataset[T] = { import spark.implicits._ spark.read .json(path) .as[T] .filter(filterDataset) } val path : Strin

我想为加载数据集创建一个通用特性:

case class Foo(name: String)

trait Loader[T] {
  def load(implicit spark: SparkSession): Dataset[T] = {
    import spark.implicits._
    spark.read
      .json(path)
      .as[T]
      .filter(filterDataset)
  }

  val path : String
  val filterDataset : T => Boolean
}

object FooLoader extends Loader[Foo] {
  val path = "/path/to/foo.json"
  val filterDataset: Foo => Boolean = foo => foo.name.nonEmpty
}

这导致
无法找到数据集中存储的类型的编码器。我可以通过将
.as[T]
调用移动到对象来解决这个问题

trait Loader[T] {
  def load(implicit spark: SparkSession): Dataset[T] = {
    import spark.implicits._
    toDS(
      spark
        .read
        .json(path)
    )
      .filter(filterDataset)
  } 

  val path : String
  val filterDataset : T => Boolean
  def toDS(df: DataFrame)(implicit spark: SparkSession): Dataset[T]
}

 

object FooLoader extends Loader[Foo] {
  val path = "/path/to/foo.json"
  val filterDataset: Foo => Boolean = foo => foo.name.nonEmpty 

  def toDS(df: DataFrame)(implicit spark: SparkSession): Dataset[Foo] = {
    import spark.implicits._
    df.as[Foo]
  }
}
然而,这个解决方案需要在每个类中实现
toDS
方法。我知道如何使用泛型函数实现泛型数据加载,例如,
def[T]load=spark.read.json.as[T]
,但我的目标是使用泛型特性。如何告诉编译器类型
T
有一个编码器