Apache spark 如何在java8中使用lambda表达式以便它可以在spark上运行
我正在学习spark2.1.0,现在正在eclipse上编写spark WordCount。Apache spark 如何在java8中使用lambda表达式以便它可以在spark上运行,apache-spark,lambda,java-8,Apache Spark,Lambda,Java 8,我正在学习spark2.1.0,现在正在eclipse上编写spark WordCount。 我的spark在远程机器上运行,它的名称是c1。主机是spark://c1:7077 我的文件在hadoop2.3.7上。输入文件在hdfs://c1:9000/tmp/README.md 这是我的代码: import java.util.Arrays; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPa
我的spark在远程机器上运行,它的名称是c1。主机是
spark://c1:7077
我的文件在hadoop2.3.7上。输入文件在
hdfs://c1:9000/tmp/README.md
这是我的代码:
import java.util.Arrays;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
public class WordCount {
public static void main(String[] args) {
String input = "hdfs://c1:9000/tmp/README.md";
String output = "hdfs://c1:9000/tmp/output";
SparkConf conf = new SparkConf().setAppName("Word Count").setMaster("spark://c1:7077");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> textFile = sc.textFile(input);
JavaPairRDD<String, Integer> counts = textFile
.flatMap(s -> Arrays.asList(s.split(" ")).iterator())
.mapToPair(word -> new Tuple2<>(word, 1))
.reduceByKey((a, b) -> a + b);
counts.saveAsTextFile(output);
sc.close();
}
}
我在谷歌上搜索了这个问题。我找到了答案。在注释中,我看到检查包含lambda表达式的类,即SparkDemo是否在远程端可用。
。我听不懂这句话。我只能在火花机上运行我的程序?因此,我必须将我的程序复制到c1,并在c1上运行程序
所以我的问题是:
如何在计算机上运行程序而不是在c1上
我不认为我的问题是重复的。得到19个支持的答案解释了这个问题是如何发生的。然后我尝试使用setJars方法,但问题仍然存在。可能重复的我不认为我的问题是重复的。得到19个支持的答案解释了这个问题是如何发生的。然后我尝试使用setJars方法,但问题仍然存在。正如答案所解释的,
ClassCastException
是在另一个异常之后发生的序列化工件。虽然缺少的类似乎是一个常见的来源,但在您的情况下,它仍然可能是另一个例外。坏消息是,我们没有机会仅仅通过查看掩盖原始异常的异常来发现。你必须自己调查才能找到真正的原因。通过用内部类替换lambda表达式,很容易检查问题是否在于查找类,然后尝试运行并查看是否出现适当的异常…
17/03/12 22:25:50 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.122.2, executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.f$3 of type org.apache.spark.api.java.function.FlatMapFunction in instance of org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2237)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2122)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)