Java 运行apache spark作业时出现任务不可序列化异常
下面的java程序是用ApacheSpark编写的 程序尝试从相应的文件中读取肯定和否定词列表,将其与主文件进行比较,并相应地过滤结果Java 运行apache spark作业时出现任务不可序列化异常,java,apache-spark,Java,Apache Spark,下面的java程序是用ApacheSpark编写的 程序尝试从相应的文件中读取肯定和否定词列表,将其与主文件进行比较,并相应地过滤结果 import java.io.Serializable; import java.io.FileNotFoundException; import java.io.File; import java.util.*; import java.util.Iterator; import java.util.List; import java.util.List; i
import java.io.Serializable;
import java.io.FileNotFoundException;
import java.io.File;
import java.util.*;
import java.util.Iterator;
import java.util.List;
import java.util.List;
import org.apache.spark.api.java.*;
import org.apache.spark.api.java.function.Function;
public class SimpleApp implements Serializable{
public static void main(String[] args) {
String logFile = "/tmp/master.txt"; // Should be some file on your system
String positive = "/tmp/positive.txt"; // Should be some file on your system
String negative = "/tmp/negative.txt"; // Should be some file on your system
JavaSparkContext sc = new JavaSparkContext("local[4]", "Twitter Analyzer", "/home/welcome/Downloads/spark-1.1.0/", new String[]{"target/scala-2.10/Simple-assembly-0.1.0.jar"});
JavaRDD<String> positiveComments = sc.textFile(logFile).cache();
List<String> positiveList = GetSentiments(positive);
List<String> negativeList= GetSentiments(negative);
final Iterator<String> iterator = positiveList.iterator();
int i = 0;
while (iterator.hasNext())
{
JavaRDD<String> numAs = positiveComments.filter(new Function<String, Boolean>()
{
public Boolean call(String s)
{
return s.contains(iterator.next());
}
});
numAs.saveAsTextFile("/tmp/output/"+ i);
i++;
}
}
public static List<String> GetSentiments(String fileName) {
List<String> input = new ArrayList<String>();
try
{
Scanner sc = new Scanner(new File(fileName));
while (sc.hasNextLine()) {
input.add(sc.nextLine());
}
}
catch (FileNotFoundException e){
// do stuff here..
}
return input;
}
}
任何指针???创建匿名类时,编译器会执行一些操作:
JavaRDD<String> numAs = positiveComments.filter(new Function<String, Boolean>()
{
public Boolean call(String s)
{
return s.contains(iterator.next());
}
});
一些Java事实
迭代器不可序列化有什么原因吗?
JavaRDD<String> numAs = positiveComments.filter(new Function<String, Boolean>()
{
public Boolean call(String s)
{
return s.contains(iterator.next());
}
});
JavaRDD<String> numAs = positiveComments.filter(new Function<String, Boolean>()
{
private Iterator<...> $iterator;
public Boolean call(String s)
{
return s.contains($iterator.next());
}
});
String value = iterator.next();
JavaRDD<String> numAs = positiveComments.filter(new Function<String, Boolean>()
{
public Boolean call(String s)
{
return s.contains(value);
}
});