Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/328.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Apache Spark的这个错误是什么意思?_Java_Apache Spark - Fatal编程技术网

Java Apache Spark的这个错误是什么意思?

Java Apache Spark的这个错误是什么意思?,java,apache-spark,Java,Apache Spark,我正在学习Spark 1.2,方法是在本地机器上运行Spark 1.2,由一名主机和一名工人操作。我通过运行.sbin/start all.sh启动spark 主控器和辅助器打开,我可以在ui中看到它们。如果我从github运行该程序,如果我像这样配置spark上下文,它就会工作: String[] jars = {"pathto/nlp.jar"}; SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMas

我正在学习Spark 1.2,方法是在本地机器上运行Spark 1.2,由一名主机和一名工人操作。我通过运行
.sbin/start all.sh启动spark

主控器和辅助器打开,我可以在ui中看到它们。如果我从github运行该程序,如果我像这样配置spark上下文,它就会工作:

String[] jars = {"pathto/nlp.jar"};
SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("spark://myurl:7077").setJars(jars);
JavaRDD<Iterator<List<HasWord>>> sentences = lines.flatMap(new FlatMapFunction<String, Iterator<List<HasWord>>>() {
      /**
     * 
     */
    private static final long serialVersionUID = 1L;

    @Override
      public Iterable<Iterator<List<HasWord>>> call(String s) {
          return (Iterable<Iterator<List<HasWord>>>) new DocumentPreprocessor(s).iterator();
      }
});
在我的java中,我将一个大文档分成如下句子:

String[] jars = {"pathto/nlp.jar"};
SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("spark://myurl:7077").setJars(jars);
JavaRDD<Iterator<List<HasWord>>> sentences = lines.flatMap(new FlatMapFunction<String, Iterator<List<HasWord>>>() {
      /**
     * 
     */
    private static final long serialVersionUID = 1L;

    @Override
      public Iterable<Iterator<List<HasWord>>> call(String s) {
          return (Iterable<Iterator<List<HasWord>>>) new DocumentPreprocessor(s).iterator();
      }
});
现在我想试着过滤掉一些句子(现在,我只想通过总是返回true来过滤掉所有的句子)

我得到一个很长的堆栈跟踪:

15/01/30 16:47:18 INFO DAGScheduler: Job 0 failed: count at JavaWordCount.java:134, took 1.203987 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 17, lens.att.net): java.io.InvalidClassException: nlp.nlp.JavaWordCount$1; local class incompatible: stream classdesc serialVersionUID = 1, local class serialVersionUID = 8625903781884920246
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
    at akka.actor.ActorCell.invoke(ActorCell.scala:487)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
    at akka.dispatch.Mailbox.run(Mailbox.scala:220)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
如果我没有声明序列id,我也会得到一个(不同的)堆栈跟踪

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task 3.3 in stage 1.0 (TID 68, lens.att.net): java.io.InvalidClassException: nlp.nlp.JavaWordCount$2; local class incompatible: stream classdesc serialVersionUID = 3752701569517815536, local class serialVersionUID = 6132153642693122455
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
    at akka.actor.ActorCell.invoke(ActorCell.scala:487)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
    at akka.dispatch.Mailbox.run(Mailbox.scala:220)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
似乎某个类没有正确声明序列ID。但无论是否包含串行ID,我都会得到一个错误(如上所示)

注释

我在eclipse中运行这个。我在eclipse中有一个maven项目,配置如下:

<dependency>
   <groupId>org.apache.spark</groupId>
   <artifactId>spark-core_2.10</artifactId>
   <version>1.2.0</version>
</dependency>

org.apache.spark
spark-core_2.10
1.2.0
我还在本地机器上运行spark。我下载到一个目录pathto/spark-1.2.0-bin-hadoop2.4

什么需要序列号?这里出了什么问题

异常抱怨的类是
nlp.nlp.JavaWordCount$1
。这是匿名内部类的“名称”

看看你的代码,我会说这是你的匿名
FlatMapFunction
类。(提示是您在错误消息中看到ID为
“1”
。)


您是否在序列化和反序列化方面使用相同的JAR文件?如果没有,我猜其中一方缺少:

 private static final long serialVersionUID = 1L;
解决方法应该是使用相同的罐子

但是如果罐子已经是一样的。。。这很奇怪

作为一种可能的解决方法,请尝试将匿名内部类转换为(命名)嵌套类。。。甚至是一个外部阶级。如果这样做有效,您可以使用该数据点来帮助您追踪真正的问题

如果您在同一集群中使用不同版本的Spark,这可能是原因。建议在任何地方使用相同的版本

什么需要序列号?这里出了什么问题

异常抱怨的类是
nlp.nlp.JavaWordCount$1
。这是匿名内部类的“名称”

看看你的代码,我会说这是你的匿名
FlatMapFunction
类。(提示是您在错误消息中看到ID为
“1”
。)


您是否在序列化和反序列化方面使用相同的JAR文件?如果没有,我猜其中一方缺少:

 private static final long serialVersionUID = 1L;
解决方法应该是使用相同的罐子

但是如果罐子已经是一样的。。。这很奇怪

作为一种可能的解决方法,请尝试将匿名内部类转换为(命名)嵌套类。。。甚至是一个外部阶级。如果这样做有效,您可以使用该数据点来帮助您追踪真正的问题

如果您在同一集群中使用不同版本的Spark,这可能是原因。建议在任何地方使用相同的版本

什么需要序列号?这里出了什么问题

异常抱怨的类是
nlp.nlp.JavaWordCount$1
。这是匿名内部类的“名称”

看看你的代码,我会说这是你的匿名
FlatMapFunction
类。(提示是您在错误消息中看到ID为
“1”
。)


您是否在序列化和反序列化方面使用相同的JAR文件?如果没有,我猜其中一方缺少:

 private static final long serialVersionUID = 1L;
解决方法应该是使用相同的罐子

但是如果罐子已经是一样的。。。这很奇怪

作为一种可能的解决方法,请尝试将匿名内部类转换为(命名)嵌套类。。。甚至是一个外部阶级。如果这样做有效,您可以使用该数据点来帮助您追踪真正的问题

如果您在同一集群中使用不同版本的Spark,这可能是原因。建议在任何地方使用相同的版本

什么需要序列号?这里出了什么问题

异常抱怨的类是
nlp.nlp.JavaWordCount$1
。这是匿名内部类的“名称”

看看你的代码,我会说这是你的匿名
FlatMapFunction
类。(提示是您在错误消息中看到ID为
“1”
。)


您是否在序列化和反序列化方面使用相同的JAR文件?如果没有,我猜其中一方缺少:

 private static final long serialVersionUID = 1L;
解决方法应该是使用相同的罐子

但是如果罐子已经是一样的。。。这很奇怪

作为一种可能的解决方法,请尝试将匿名内部类转换为(命名)嵌套类。。。甚至是一个外部阶级。如果这样做有效,您可以使用该数据点来帮助您追踪真正的问题


如果您在同一集群中使用不同版本的Spark,这可能是原因。在任何地方使用相同的版本都是可取的。

在我的例子中,当spark程序与其jar依赖项不同步时,这似乎是一个问题

我的程序加载这样的jar

String[] jars = {"pathto/mydependencies.jar"};
SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("spark://mylaptop:7077").setJars(jars);

如果我对主程序进行更改,然后在eclipse中以调试模式运行它,则会出现此错误。但是,如果我重新导出到pathto/mydependencies.jar,它会修复它。

在我的例子中,当spark程序与其jar依赖项不同步时,这似乎是一个问题

我的程序加载这样的jar

String[] jars = {"pathto/mydependencies.jar"};
SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("spark://mylaptop:7077").setJars(jars);

如果我对主程序进行更改,然后在eclipse中以调试模式运行它,则会出现此错误。但是,如果我重新导出到pathto/mydependencies.jar,它会修复它。

在我的例子中,当spark程序与其jar依赖项不同步时,这似乎是一个问题

我的程序加载像这样的罐子