Apache spark 客户端中PySpark HiveConext Collect()函数的串行版本不匹配

Apache spark 客户端中PySpark HiveConext Collect()函数的串行版本不匹配,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我目前在使用下面的“纱线客户端”主程序查询配置单元表时遇到问题 如果使用master=“local”,此代码可以正常工作 在这种情况下,我得到以下错误和原因 17/06/26 12:22:43 ERROR TaskSetManager: Task 1 in stage 1.0 failed 4 times; aborting job Traceback (most recent call last): File "/home/user/hellospark_2_.py", line 31,

我目前在使用下面的“纱线客户端”主程序查询配置单元表时遇到问题

如果使用master=“local”,此代码可以正常工作

在这种情况下,我得到以下错误和原因

17/06/26 12:22:43 ERROR TaskSetManager: Task 1 in stage 1.0 failed 4 times; aborting job
Traceback (most recent call last):
  File "/home/user/hellospark_2_.py", line 31, in <module>
    test = sqlContext.sql("SELECT * FROM table WHERE column = 'x'").collect()
  File "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 280, in collect
  File "/opt/mapr/spark/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
  File "/opt/mapr/spark/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o44.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (): java.io.InvalidClassException: org.apache.spark.sql.catalyst.expressions.Literal; local class incompatible: stream classdesc serialVersionUID = -4259705229845269663, local class serialVersionUID = 3305180847846277455
17/06/26 12:22:43错误TaskSetManager:阶段1.0中的任务1失败4次;中止工作
回溯(最近一次呼叫最后一次):
文件“/home/user/hellospark_2_uu.py”,第31行,在
test=sqlContext.sql(“从列为'x'”的表中选择*).collect()
collect中的文件“/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/sql/dataframe.py”,第280行
文件“/opt/mapr/spark/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py”,第813行,在__
文件“/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/sql/utils.py”,第45行,装饰
文件“/opt/mapr/spark/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/protocol.py”,第308行,在get_return_值中
py4j.protocol.Py4JJavaError:调用o44.collectToPython时出错。
:org.apache.SparkException:作业因阶段失败而中止:阶段1.0中的任务1失败4次,最近的失败:阶段1.0中的任务1.3丢失():java.io.InvalidClassException:org.apache.spark.sql.catalyst.expressions.Literal;本地类不兼容:流classdesc serialVersionUID=-4259705229845269663,本地类serialVersionUID=3305180847846277455
经过一些测试,我发现这个错误只在使用.collect()时发生,并且只在从配置单元数据帧查询行时发生。查询名称、表列表或表描述工作正常

从我在网上看到的情况来看,它可能是由Spark、Scala或Hadoop版本不匹配引起的。在我的测试中,我发现Spark在本地或Thread客户端(版本1.6.1)中运行时是相同的。我也在本地和远程查看了dataframe.py文件,但这两种情况下似乎是相同的

我将感谢您对这个问题的任何见解或帮助

谢谢你抽出时间

17/06/26 12:22:43 ERROR TaskSetManager: Task 1 in stage 1.0 failed 4 times; aborting job
Traceback (most recent call last):
  File "/home/user/hellospark_2_.py", line 31, in <module>
    test = sqlContext.sql("SELECT * FROM table WHERE column = 'x'").collect()
  File "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 280, in collect
  File "/opt/mapr/spark/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
  File "/opt/mapr/spark/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o44.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (): java.io.InvalidClassException: org.apache.spark.sql.catalyst.expressions.Literal; local class incompatible: stream classdesc serialVersionUID = -4259705229845269663, local class serialVersionUID = 3305180847846277455