Apache spark 通过saveAsObject、Exception保存RDD;有一个不可序列化的结果:org.apache.hadoop.hbase.io.ImmutableBytesWritable“;

Apache spark 通过saveAsObject、Exception保存RDD;有一个不可序列化的结果:org.apache.hadoop.hbase.io.ImmutableBytesWritable“;,apache-spark,serialization,hbase,deserialization,alluxio,Apache Spark,Serialization,Hbase,Deserialization,Alluxio,我需要将从HBASE读取的RDD序列化到alluxio内存文件系统中,作为缓存和定期更新它以用于增量SPARK计算的方法 代码是这样的,但会遇到标题异常 val inputTableNameEvent = HBaseTables.TABLE_XXX.tableName val namedeRDDName = "EventAllCached2Update" val alluxioPath = "alluxio://hadoop1:19998/" val fileURI = alluxioPath

我需要将从HBASE读取的RDD序列化到alluxio内存文件系统中,作为缓存和定期更新它以用于增量SPARK计算的方法

代码是这样的,但会遇到标题异常

val inputTableNameEvent = HBaseTables.TABLE_XXX.tableName
val namedeRDDName = "EventAllCached2Update"
val alluxioPath = "alluxio://hadoop1:19998/"
val fileURI = alluxioPath + namedeRDDName
val path:AlluxioURI = new AlluxioURI("/"+namedeRDDName)

val fs:FileSystem = FileSystem.Factory.get()

val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, inputTableNameEvent)

val rdd = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
                classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
                classOf[org.apache.hadoop.hbase.client.Result])
numbers = rdd.count()
println("rdd count: " + numbers)
if( fs.exists(path))
       fs.delete(path)
rdd.saveAsObjectFile(fileURI)

有人能告诉我们如何将ImmutableBytesWritable映射到另一种类型以绕过此问题吗?此外,映射还需要可还原,因为以后我需要使用objectFile将此保存的对象读回,并将其转换为[(ImmutableBytesWritable,Result)]RDD,以便稍后用于更新和计算。

您需要将RDD转换为行对象。类似于下面的内容,然后将其保存到hdfs。解析后的RDD与现在使用数据的任何其他RDD一样

val parsedRDD = yourRDD.map(tuple => tuple._2).map(result => (
      Row((result.getRow.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column1".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column2".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column3".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column4".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column5".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column5".getBytes()).get(0).getValue.map(_.toChar).mkString)
      )))

您需要将rdd转换为行对象。类似于下面的内容,然后将其保存到hdfs。解析后的RDD与现在使用数据的任何其他RDD一样

val parsedRDD = yourRDD.map(tuple => tuple._2).map(result => (
      Row((result.getRow.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column1".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column2".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column3".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column4".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column5".getBytes()).get(0).getValue.map(_.toChar).mkString),
      (result.getColumn("CF".getBytes(),"column5".getBytes()).get(0).getValue.map(_.toChar).mkString)
      )))