Apache spark Spark数据帧写入配置单元失败,出现零长度BigInteger错误

Apache spark Spark数据帧写入配置单元失败,出现零长度BigInteger错误,apache-spark,jdbc,persist,Apache Spark,Jdbc,Persist,我的Spark程序使用JDBC从SQLServerDB中提取数据,几个月后运行良好,现在抛出以下错误 java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.<init>(BigInteger.java:302) at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getDecimal(Uns

我的Spark程序使用JDBC从SQLServerDB中提取数据,几个月后运行良好,现在抛出以下错误

java.lang.NumberFormatException: Zero length BigInteger
        at java.math.BigInteger.<init>(BigInteger.java:302)
        at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getDecimal(UnsafeRow.java:439)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply2077_0$(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
        at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
        at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:263)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
java.lang.NumberFormatException:长度为零的BigInteger
在java.math.biginger.(biginger.java:302)
位于org.apache.spark.sql.catalyst.expressions.UnsafeRow.getDecimal(UnsafeRow.java:439)
位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply2077_0$(未知源)
位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(未知源)
位于org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
位于org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
在scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
位于org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:263)
在org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
在org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
位于org.apache.spark.scheduler.Task.run(Task.scala:89)
位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:227)
位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
运行(Thread.java:745)
我创建了一个缩小版的程序,只需提取10条样本记录,持久化RDD并写入配置单元表。这也是同样的错误。 我试过以下几件事:

  • 仅提取非数字字段,持久化RDD并写入表 -->成功的
  • 仅拉取数值字段,持久化RDD并写入表-->成功
  • 在不持久化RDD-->的情况下拉取所有列并写入表成功

  • 仅当我同时提取数值和非数值字段、持久化RDD(RDD.persist)并写入配置单元表时,才会出现错误。
    我正在使用Spark 1.6.1,非常感谢您的输入。

    您可能在某个地方有一个空的
    字符串
    ,Spark需要一个数字(转换为
    biginger
    )。顺便提一下,当您说“仅当我持久化RDD时才会出错”时,您的意思是写入表吗?或者你的意思是调用
    rdd.persist()
    ,这是真的,但可能不是你的意思。只是确定一下。只有当我提取所有字段时,才会出现错误。请执行rdd.persist,然后将数据写入配置单元表。@Remo找到解决方案了吗?