Pyspark 配置单元中带有bigint列的拼花地板桌子

Pyspark 配置单元中带有bigint列的拼花地板桌子,pyspark,hive,parquet,bigint,Pyspark,Hive,Parquet,Bigint,当我尝试使用pyspark读取具有bigint列的拼花地板表时,它的给出错误。有什么建议吗 df=spark.table("db.table").filter("partitiondate='2020-10-22'"); df.count() // this works df.show(1) // Gives error [Stage 5:=======>(1+1)/4]20/10/23 14:14:04警告调度程序.TaskSetManager:

当我尝试使用pyspark读取具有bigint列的拼花地板表时,它的给出错误。有什么建议吗

df=spark.table("db.table").filter("partitiondate='2020-10-22'");
df.count()   // this works
df.show(1)  // Gives error
[Stage 5:=======>(1+1)/4]20/10/23 14:14:04警告调度程序.TaskSetManager:Stage 5.0中丢失的任务0.0(TID 275,anp-r12wn03.c03.hadoop.td.com,executor 13):java.lang.UnsupportedOperationException:parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary 位于parquet.column.Dictionary.decodeToLong(Dictionary.java:52) 位于org.apache.spark.sql.execution.vectoried.OnHeapColumnVector.getLong(OnHeapColumnVector.java:295) 位于org.apache.spark.sql.execution.vectoriazed.ColumnarBatch$Row.getLong(ColumnarBatch.java:191) 位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_6$(未知源) 位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(未知源) 位于org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(未知源) 位于scala.collection.Iterator$$anon$11.next(Iterator.scala:409) 位于scala.collection.Iterator$$anon$11.next(Iterator.scala:409) 位于org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:235) 位于org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) 位于org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(rdd.scala:835) 位于org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(rdd.scala:835) 在org.apache.spark.rdd.MapPartitionsRDD.compute上(MapPartitionsRDD.scala:49) 在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:323)上 位于org.apache.spark.rdd.rdd.iterator(rdd.scala:287) 位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 位于org.apache.spark.scheduler.Task.run(Task.scala:109) 位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:380) 位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 运行(Thread.java:748)

[第五阶段:======================>(2+2)/4]20/1