Python 数据帧内部联接中的PyResult错误_Python_Apache Spark_Pyspark_Apache Spark Sql

Python 数据帧内部联接中的PyResult错误

python apache-spark pyspark

Python 数据帧内部联接中的PyResult错误,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,在docker容器内运行独立的spark-2.3.0-bin-hadoop2.7 df1=5行 df2=10行数据集非常小 df1模式：数据帧[id:bigint，name:string] df2模式：数据帧[id:decimal（12,0），age:int] 内连接 df3 = df1.join(df2, df1.id == df2.id, 'inner') df3 schema: Dataframe[id:bigint, name:string, age: int] 执行df3.s

在docker容器内运行独立的spark-2.3.0-bin-hadoop2.7

df1=5行
df2=10行

数据集非常小

df1模式：数据帧[id:bigint，name:string]
df2模式：数据帧[id:decimal（12,0），age:int]

内连接

df3 = df1.join(df2, df1.id == df2.id, 'inner')

df3 schema: Dataframe[id:bigint, name:string, age: int]

执行

df3.show（5）

时，出现以下错误

Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/usr/apache/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 466, in collect
    port = self._jdf.collectToPython()   File "/usr/local/lib/python3.6/dist-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)   File "/usr/apache/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)   File "/usr/local/lib/python3.6/dist-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o43.collectToPython. : org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
        at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:136)

我使用的JRE版本与Spark 2.3不兼容

在Docker Image中使用openjdk-8-JRE更新JRE后，错误得到解决

您可以调用

df1.show（）

和

df2.show

而不发生错误吗？是的，df1.show（）和df2.show（）工作正常

conf = SparkConf().set("spark.sql.broadcastTimeout","-1")