Apache spark Spark数据帧写入配置单元失败,出现零长度BigInteger错误
我的Spark程序使用JDBC从SQLServerDB中提取数据,几个月后运行良好,现在抛出以下错误Apache spark Spark数据帧写入配置单元失败,出现零长度BigInteger错误,apache-spark,jdbc,persist,Apache Spark,Jdbc,Persist,我的Spark程序使用JDBC从SQLServerDB中提取数据,几个月后运行良好,现在抛出以下错误 java.lang.NumberFormatException: Zero length BigInteger at java.math.BigInteger.<init>(BigInteger.java:302) at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getDecimal(Uns
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.<init>(BigInteger.java:302)
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getDecimal(UnsafeRow.java:439)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply2077_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:263)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.lang.NumberFormatException:长度为零的BigInteger
在java.math.biginger.(biginger.java:302)
位于org.apache.spark.sql.catalyst.expressions.UnsafeRow.getDecimal(UnsafeRow.java:439)
位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply2077_0$(未知源)
位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(未知源)
位于org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
位于org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
在scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
位于org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:263)
在org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
在org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
位于org.apache.spark.scheduler.Task.run(Task.scala:89)
位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:227)
位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
运行(Thread.java:745)
我创建了一个缩小版的程序,只需提取10条样本记录,持久化RDD并写入配置单元表。这也是同样的错误。
我试过以下几件事:
仅当我同时提取数值和非数值字段、持久化RDD(RDD.persist)并写入配置单元表时,才会出现错误。
我正在使用Spark 1.6.1,非常感谢您的输入。您可能在某个地方有一个空的
字符串
,Spark需要一个数字(转换为biginger
)。顺便提一下,当您说“仅当我持久化RDD时才会出错”时,您的意思是写入表吗?或者你的意思是调用rdd.persist()
,这是真的,但可能不是你的意思。只是确定一下。只有当我提取所有字段时,才会出现错误。请执行rdd.persist,然后将数据写入配置单元表。@Remo找到解决方案了吗?