Scala 如何为spark对可序列化的类进行单元测试?

Scala 如何为spark对可序列化的类进行单元测试?,scala,unit-testing,apache-spark,serialization,kryo,Scala,Unit Testing,Apache Spark,Serialization,Kryo,我刚刚在spark中发现了一个类序列化错误 =>现在,我想做一个单元测试,但我不知道怎么做 注: 失败附加在已广播的(反)序列化对象中 我想确切地测试spark将做什么,断言它一旦部署就可以工作 要序列化的类是扩展序列化程序的标准类(不是case类) 通过研究spark广播代码,我找到了一种方法。但它使用私有spark代码,所以如果spark在内部发生更改,它可能会变得无效。但它仍然有效 在以org.apache.spark开头的包中添加测试类,例如: package org.apache.s

我刚刚在spark中发现了一个类序列化错误

=>现在,我想做一个单元测试,但我不知道怎么做

注:

  • 失败附加在已广播的(反)序列化对象中
  • 我想确切地测试spark将做什么,断言它一旦部署就可以工作
  • 要序列化的类是扩展序列化程序的标准类(不是case类)

通过研究spark广播代码,我找到了一种方法。但它使用私有spark代码,所以如果spark在内部发生更改,它可能会变得无效。但它仍然有效

在以
org.apache.spark
开头的包中添加测试类,例如:

package org.apache.spark.my_company_tests

// [imports]

/**
 * test data that need to be broadcast in spark (using kryo)
 */
class BroadcastSerializationTests extends FlatSpec with Matchers {

  it should "serialize a transient val, which should be lazy" in {

    val data = new MyClass(42) // data to test
    val conf = new SparkConf()


    // Serialization
    //   code found in TorrentBroadcast.(un)blockifyObject that is used by TorrentBroadcastFactory
    val blockSize = 4 * 1024 * 1024 // 4Mb
    val out = new ChunkedByteBufferOutputStream(blockSize, ByteBuffer.allocate)
    val ser = new KryoSerializer(conf).newInstance() // Here I test using KryoSerializer, you can use JavaSerializer too
    val serOut = ser.serializeStream(out)

    Utils.tryWithSafeFinally { serOut.writeObject(data) } { serOut.close() }

    // Deserialization
    val blocks = out.toChunkedByteBuffer.getChunks()
    val in = new SequenceInputStream(blocks.iterator.map(new ByteBufferInputStream(_)).asJavaEnumeration)
    val serIn = ser.deserializeStream(in)

    val data2 = Utils.tryWithSafeFinally { serIn.readObject[MyClass]() } { serIn.close() }

    // run test on data2
    data2.yo shouldBe data.yo
  }
}

class MyClass(i: Int) extends Serializable {
  @transient val yo = 1 to i // add lazy to make the test pass: not lazy transient val are not recomputed after deserialization
}

进口货呢?