Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Kryo/Chill Scala序列化程序-序列化包含其他类的自定义类_Scala_Serialization_Apache Spark_Kryo_Scalding - Fatal编程技术网

Kryo/Chill Scala序列化程序-序列化包含其他类的自定义类

Kryo/Chill Scala序列化程序-序列化包含其他类的自定义类,scala,serialization,apache-spark,kryo,scalding,Scala,Serialization,Apache Spark,Kryo,Scalding,我想序列化一个滚烫的TypedPipe[MyClass]并在Spark 1.5.1中对其进行去序列化 我能够使用kryo和Twitter的Chill for Scala,序列化/反序列化一个只包含“原语”的“简单”案例类,如布尔和地图: //In Scalding case class MyClass(val foo: Boolean) extends Serializable {} val data = ... //TypedPipe[MyClass] def serialize[A](d

我想序列化一个滚烫的
TypedPipe[MyClass]
并在Spark 1.5.1中对其进行去序列化

我能够使用kryo和Twitter的Chill for Scala,序列化/反序列化一个只包含“原语”的“简单”案例类,如布尔和地图:

//In Scalding
case class MyClass(val foo: Boolean) extends Serializable {}

val data = ... //TypedPipe[MyClass]

def serialize[A](data: A) = {
  val instantiator = new ScalaKryoInstantiator
  instantiator.setRegistrationRequired(false)
  val kryo = instantiator.newKryo()
  val bao = new ByteArrayOutputStream
  val output = new Output(bao)
  kryo.writeObject(output, data)
  output.close
  bao.toByteArray()
}

data.map(t => (NullWritable.get, new BytesWritable(serialize(t))))
  .write(WritableSequenceFile(outPath))

//In Spark:
def deserialize[A](ser: Array[Byte], clazz: Class[A]): A = {
  val instantiator = new ScalaKryoInstantiator
  instantiator.setRegistrationRequired(false)
  val kryo = instantiator.newKryo()
  val input = new Input(new ByteArrayInputStream(ser))
  val deserData = kryo.readObject(input, clazz)
  deserData
}

sc.sequenceFile(inPath, classOf[NullWritable], classOf[BytesWritable]).map(_._2)
  .map(t => deserialize(t.get, classOf[MyClass])) //where 'sc' is SparkContext
我还能够序列化/反序列化一个“复杂”类,该类包含由我编写或不由我编写的其他自定义类的成员(例如,
org.joda.time.LocalDate
)。我使用Kryo的默认序列化程序,按照Kryo文档中提到的相同顺序在序列化和反序列化期间注册类:

//In Scalding
class MyClass2(val bar: MyClass, val someDate: LocalDate) extends Serializable {}

def serialize[A](data: A) = {
  val instantiator = new ScalaKryoInstantiator
  instantiator.setRegistrationRequired(false)
  val kryo = instantiator.newKryo()
  kryo.register(classOf[MyClass2])
  kryo.register(classOf[MyClass])
  kryo.register(classOf[LocalDate])
  kryo.register(classOf[ISOChronology])
  kryo.register(classOf[GregorianChronology])
  val bao = new ByteArrayOutputStream
  val output = new Output(bao)
  kryo.writeObject(output, data)
  output.close
  bao.toByteArray()
}

//In Spark
def deserialize[A](ser: Array[Byte], clazz: Class[A]): A = {  
  val instantiator = new ScalaKryoInstantiator
  instantiator.setRegistrationRequired(false)
  val kryo = instantiator.newKryo()
  kryo.register(classOf[MyClass2])
  kryo.register(classOf[MyClass])
  kryo.register(classOf[LocalDate])
  kryo.register(classOf[ISOChronology])
  kryo.register(classOf[GregorianChronology])
  val input = new Input(new ByteArrayInputStream(ser))
  val deserData = kryo.readObject(input, clazz)
  deserData
}
a) 如前所述,这是可行的,但似乎过于冗长。我是否错过了一种更简单的方法


b) 当我只注册LocalDate时,Spark抱怨它不“知道”等时学。当我注册等时学时,它抱怨说它不懂格里高利年代学。我注册了格里高利年代学,Spark停止了抱怨,一切正常。有没有办法注册LocalDate及其所有内容?

为什么要编写自己的序列化程序/反序列化程序?要在Spark中使用Kryo序列化程序,只需在SparkContext上对其进行简单配置。比如
val conf=new SparkConf().setName(“MyName”).set(“spark.serializer”、“org.apache.spark.serializer.KryoSerializer”).registerKryoClasses(数组(classOf[MyClass1]、[MyClass2]、…)
。看,这看起来确实对(a)有帮助,但只有在Spark中。非常感谢。为什么要编写自己的序列化程序/反序列化程序?要在Spark中使用Kryo序列化程序,只需在SparkContext上对其进行简单配置。比如
val conf=new SparkConf().setName(“MyName”).set(“spark.serializer”、“org.apache.spark.serializer.KryoSerializer”).registerKryoClasses(数组(classOf[MyClass1]、[MyClass2]、…)
。看,这看起来确实对(a)有帮助,但只有在Spark中。非常感谢。