Apache spark java.util.NoSuchElementException:None.get Spark的show()操作出错

Apache spark java.util.NoSuchElementException:None.get Spark的show()操作出错,apache-spark,pyspark,apache-spark-sql,greenplum,Apache Spark,Pyspark,Apache Spark Sql,Greenplum,我试图从Greenplum数据库中检索数据,并使用Pyspark进行显示。这是我已经实现的代码 import pyspark from pyspark import SparkContext,SparkConf from pyspark.sql import SparkSession from pyspark.sql import SQLContext spark = SparkSession \ .builder \ .appName("spkap

我试图从Greenplum数据库中检索数据,并使用Pyspark进行显示。这是我已经实现的代码

import pyspark

from pyspark import SparkContext,SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext

spark = SparkSession \
        .builder \
        .appName("spkapp") \
        .master("local[*]") \
        .config("spark.debug.maxToStringFields", "100")\
        .config("spark.sql.broadcastTimeout", "36000")\
        .config("spark.network.timeout", "600s")\
        .config('spark.executor.cores','1')\
        .getOrCreate()

gscPythonOptions = {
    "url": "jdbc:postgresql://localhost:5432/db_name",
    "user": "my_user",
    "password": "",
    "dbschema": "public"

}

gpdf_swt = spark.read.format("greenplum").options(**gscPythonOptions,dbtable="products",partitionColumn= "id").load()

gpdf_swt.printSchema()

gpdf_swt.show()
但是,当我使用spark submit运行python文件时,会出现如下错误

20/12/30 21:23:33 ERROR TaskSetManager: Task 2 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "/home/credit_card/summary_table_creation2Test.py", line 38, in <module>
    gpdf_swt.count()
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 524, in show
  File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o84.show.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost, executor driver): java.util.NoSuchElementException: None.get
    at scala.None$.get(Option.scala:347)
    at scala.None$.get(Option.scala:345)
    at io.pivotal.greenplum.spark.jdbc.Jdbc$.getDistributedTransactionId(Jdbc.scala:500)
    at io.pivotal.greenplum.spark.externaltable.GreenplumRowIterator.<init>(GreenplumRowIterator.scala:100)
    at io.pivotal.greenplum.spark.GreenplumRDD.compute(GreenplumRDD.scala:49)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

感谢您的帮助,以克服此错误

编辑-:
我的Greenplum版本是6.4.0。还有一个类似的问题。但其解决方案仅适用于高于6.7.1的greenplum版本。

这是否回答了您的问题?我的greenplum版本是6.4.0。但在上面的帖子中提到,问题只出现在高于6.7.1的版本上。@Harini您使用的是什么版本的火花连接器?@frankgh我尝试了Greenplum用户的1.6.2和1.7.0版本(即,
my_用户
)需要在
pg\u设置
gp\u分布式目录表上具有
SELECT
权限。我怀疑您的用户没有这样做,这导致PSQL异常被丢弃//转入NoTouchElementException。这是否回答了您的问题?我的greenplum版本是6.4.0。但在上面的帖子中提到,问题只出现在高于6.7.1的版本上。@Harini您使用的是什么版本的火花连接器?@frankgh我尝试了Greenplum用户的1.6.2和1.7.0版本(即,
my_用户
)需要在
pg\u设置
gp\u分布式目录表上具有
SELECT
权限。我怀疑您的用户没有这样做,这导致PSQL异常被丢弃//转入NoTouchElementException。
/usr/local/spark/bin/spark-submit --driver-class-path /root/greenplum/greenplum-spark_2.11-1.6.2.jar summary_table_creation