Pyspark 配置单元中带有bigint列的拼花地板桌子
当我尝试使用pyspark读取具有bigint列的拼花地板表时,它的给出错误。有什么建议吗Pyspark 配置单元中带有bigint列的拼花地板桌子,pyspark,hive,parquet,bigint,Pyspark,Hive,Parquet,Bigint,当我尝试使用pyspark读取具有bigint列的拼花地板表时,它的给出错误。有什么建议吗 df=spark.table("db.table").filter("partitiondate='2020-10-22'"); df.count() // this works df.show(1) // Gives error [Stage 5:=======>(1+1)/4]20/10/23 14:14:04警告调度程序.TaskSetManager:
df=spark.table("db.table").filter("partitiondate='2020-10-22'");
df.count() // this works
df.show(1) // Gives error
[Stage 5:=======>(1+1)/4]20/10/23 14:14:04警告调度程序.TaskSetManager:Stage 5.0中丢失的任务0.0(TID 275,anp-r12wn03.c03.hadoop.td.com,executor 13):java.lang.UnsupportedOperationException:parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
位于parquet.column.Dictionary.decodeToLong(Dictionary.java:52)
位于org.apache.spark.sql.execution.vectoried.OnHeapColumnVector.getLong(OnHeapColumnVector.java:295)
位于org.apache.spark.sql.execution.vectoriazed.ColumnarBatch$Row.getLong(ColumnarBatch.java:191)
位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_6$(未知源)
位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(未知源)
位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(未知源)
位于scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
位于scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:235)
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
位于org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(rdd.scala:835)
位于org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(rdd.scala:835)
在org.apache.spark.rdd.MapPartitionsRDD.compute上(MapPartitionsRDD.scala:49)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:323)上
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:287)
位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
位于org.apache.spark.scheduler.Task.run(Task.scala:109)
位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:380)
位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
运行(Thread.java:748)
[第五阶段:======================>(2+2)/4]20/1