Java apache spark中的类异常无效
我正在尝试使用spark submit运行spark作业。当我在eclipse中运行它时,作业运行没有任何问题。当我将同一个jar文件复制到远程机器并在那里运行作业时,我会遇到以下问题Java apache spark中的类异常无效,java,apache-spark,Java,Apache Spark,我正在尝试使用spark submit运行spark作业。当我在eclipse中运行它时,作业运行没有任何问题。当我将同一个jar文件复制到远程机器并在那里运行作业时,我会遇到以下问题 17/08/09 10:19:15 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-50-70-180.ec2.internal): java.io.InvalidClassException: org.apache.spark.exec
17/08/09 10:19:15 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-50-70-180.ec2.internal): java.io.InvalidClassException: org.apache.spark.executor.TaskMetrics; local class incompatible: stream classdesc serialVersionUID = -2231953621568687904, local class serialVersionUID = -6966587383730940799
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1829)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1986)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我在SO中看到了一些其他链接,并尝试了以下链接
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-yarn_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
编辑2:
我还尝试添加变量serialVersionUID=-22319536215686877904L到相关类,但这并没有解决问题我最终解决了问题。我注释掉了所有依赖项,并一次取消注释一个。首先,我取消了spark_core依赖项的注释,问题得到了解决。我在我的项目中取消了另一个依赖项的注释,这再次带来了这个问题。然后在调查中,我发现第二个依赖项依次具有导致问题的spark_core的不同版本(2.10)的依赖项。我将排除添加到依赖项中,如下所示:
<dependency>
<groupId>com.data.utils</groupId>
<artifactId>data-utils</artifactId>
<version>1.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<groupId>javax.ws.rs</groupId>
<artifactId>javax.ws.rs-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
</exclusion>
</exclusions>
</dependency>
com.data.utils
数据实用程序
1.0-快照
javax.ws.rs
javax.ws.rs-api
org.apache.spark
spark-core_2.10
这解决了问题。以防有人在这个问题上陷入困境。感谢@Josepraven为我提供的宝贵意见。当Spark master和一个或多个Spark Slave上使用稍微不同的jar版本时,我们看到了这个问题
我面临这个问题,因为我只将我的jar复制到主节点。一旦我将jar复制到所有从属节点,我的应用程序就开始正常工作。您也可以发布spark submit命令吗?当然可以。我将添加它现在版本不匹配之间的火花作业,你提交。请检查群集中的spark版本,并将其添加到pom文件中。@Josepraven我在机器中检查了spark版本,它返回了版本2.0.2。因此,我在pom文件中将spark的版本修改为2.0.2。请查看我拥有的依赖项added@SathiyaNarayanan我认为问题在于w.r.t系列化。请检查这个
<dependency>
<groupId>com.data.utils</groupId>
<artifactId>data-utils</artifactId>
<version>1.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<groupId>javax.ws.rs</groupId>
<artifactId>javax.ws.rs-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
</exclusion>
</exclusions>
</dependency>