Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 无法写入/保存数据以直接从Spark RDD点燃_Java_Scala_Apache Spark_Jdbc_Ignite - Fatal编程技术网

Java 无法写入/保存数据以直接从Spark RDD点燃

Java 无法写入/保存数据以直接从Spark RDD点燃,java,scala,apache-spark,jdbc,ignite,Java,Scala,Apache Spark,Jdbc,Ignite,我尝试使用jdbc编写数据帧来点燃 Spark版本为:2.1 点火版本:2.3 JDK:1.8 斯卡拉:2.11.8 这是我的代码片段: def WriteToIgnite(hiveDF:DataFrame,targetTable:String):Unit = { val conn = DataSource.conn var psmt:PreparedStatement = null try { OperationIgniteUtil.deleteIgniteData(c

我尝试使用jdbc编写数据帧来点燃

Spark版本为:2.1

点火版本:2.3

JDK:1.8

斯卡拉:2.11.8

这是我的代码片段:

def WriteToIgnite(hiveDF:DataFrame,targetTable:String):Unit = {

  val conn = DataSource.conn
  var psmt:PreparedStatement = null

  try {
    OperationIgniteUtil.deleteIgniteData(conn,targetTable)

    hiveDF.foreachPartition({
      partitionOfRecords => {
        partitionOfRecords.foreach(
          row => for ( i <- 0 until row.length ) {
            psmt = OperationIgniteUtil.getInsertStatement(conn, targetTable, hiveDF.schema)
            psmt.setObject(i+1, row.get(i))
            psmt.execute()
          }
        )
      }
    })

  }catch {
    case e: Exception =>  e.printStackTrace()
  } finally {
    conn.close
  }
}
def WriteToIgnite(hiveDF:DataFrame,targetTable:String):单位={
val conn=DataSource.conn
var psmt:PreparedStatement=null
试一试{
OperationIgniteUtil.deleteIgniteData(连接,目标)
前驱体({
记录的分区=>{
记录的分区。foreach(
row=>for(即printStackTrace()
}最后{
康涅狄格州
}
}
然后我在spark上运行,它会打印错误消息:

org.apache.spark.SparkException:任务不可序列化 位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) 位于org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) 位于org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) 位于org.apache.spark.SparkContext.clean(SparkContext.scala:2094) 位于org.apache.spark.rdd.rdd$$anonfun$foreachPartition$1.apply(rdd.scala:924) 位于org.apache.spark.rdd.rdd$$anonfun$foreachPartition$1.apply(rdd.scala:923) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 位于org.apache.spark.rdd.rdd.withScope(rdd.scala:362) 位于org.apache.spark.rdd.rdd.foreachPartition(rdd.scala:923) 在org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2305) 位于org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2305) 位于org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2305) 位于org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) 位于org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) 位于org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2304) 在com.pingan.pilot.ignite.common.OperationIgniteUtil$.WriteToIgnite上(OperationIgniteUtil.scala:72) 在com.pingan.pilot.ignite.etl.HdfsToIgnite$.main上(HdfsToIgnite.scala:36) 位于com.pingan.pilot.ignite.etl.HdfsToIgnite.main(HdfsToIgnite.scala) 在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处 位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中 位于java.lang.reflect.Method.invoke(Method.java:498) 位于org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) 位于org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 位于org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 位于org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)上,由以下原因引起:java.io.NotSerializableException: org.apache.ignite.internal.jdbc2.jdbc连接序列化堆栈: -对象不可序列化(类:org.apache.ignite.internal.jdbc2.JdbcConnection,值: org.apache.ignite.internal.jdbc2。JdbcConnection@7ebc2975) -字段(类:com.pingan.pilot.ignite.common.OperationIgniteUtil$$anonfun$WriteToIgnite$1, 名称:conn$1,类型:interface(java.sql.Connection) -对象(类com.pingan.pilot.ignite.common.OperationIgniteUtil$$anonfun$WriteToIgnite$1, ) 位于org.apache.spark.serializer.SerializationDebugger$.ImproveeException(SerializationDebugger.scala:40) 位于org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) 位于org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) 位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) …还有27个

有人知道我要修吗?
谢谢!

您必须扩展可序列化接口

object Test extends Serializable { 
  def WriteToIgnite(hiveDF:DataFrame,targetTable:String):Unit = {
   ???
  }
}

我希望它能解决您的问题。

这里的问题是您无法序列化连接到Ignite
DataSource.conn
。您提供给
forEachPartition
的闭包将连接作为其作用域的一部分,这就是Spark无法序列化它的原因

幸运的是,Ignite提供了RDD的自定义实现,允许您将值保存到其中。您需要首先创建
IgniteContext
,然后检索Ignite的共享RDD,该RDD提供对Ignite的分布式访问,以保存RDD的

val igniteContext = new IgniteContext(sparkContext, () => new IgniteConfiguration())
...

// Retrieve Ignite's shared RDD
val igniteRdd = igniteContext.fromCache("partitioned")
igniteRDD.saveValues(hiveDF.toRDD)
有关更多信息,请访问。

这应该会有所帮助