Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Spark读取文本文件在线程中本地引发异常;“主要”;org.apache.spark.SparkException:任务不可序列化_Java_Apache Spark - Fatal编程技术网

Java Spark读取文本文件在线程中本地引发异常;“主要”;org.apache.spark.SparkException:任务不可序列化

Java Spark读取文本文件在线程中本地引发异常;“主要”;org.apache.spark.SparkException:任务不可序列化,java,apache-spark,Java,Apache Spark,我正在用java编写我的第一个spark程序,无法找出下面的错误。我已经讨论了很多关于堆栈溢出的问题,但他们认为这些问题与我的问题无关。我正在尝试使用spark 2.4.4的最新版本。我正在本地运行我的应用程序 这是我的程序 import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; public c

我正在用java编写我的第一个spark程序,无法找出下面的错误。我已经讨论了很多关于堆栈溢出的问题,但他们认为这些问题与我的问题无关。我正在尝试使用spark 2.4.4的最新版本。我正在本地运行我的应用程序

这是我的程序

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class SparkTextFile {

    public static void main(String args[]) {

        SparkConf conf = new SparkConf().setAppName("textfilereading").setMaster("local[*]");
        JavaSparkContext context = new JavaSparkContext(conf);
        JavaRDD<String> textRDD = context.textFile("/Users/user/Downloads/AccountHistory.csv");
        textRDD.foreach(System.out::println);
        context.close();

    }

}
我不知道为什么会出现这种错误,因为除了从文件中读取之外,我并没有使用任何对象进行序列化

我在底线下换了衣服

textRDD.foreach(System.out::println);

添加了collect以查看输出内容,现在我看到了不同的错误消息

Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:253)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:253)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
    at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
    at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361)
    at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360)
    at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
线程“main”java.lang.IllegalAccessError中出现异常:试图从类org.apache.hadoop.mapred.FileInputFormat访问方法com.google.common.base.Stopwatch。()V 位于org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312) 位于org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204) 位于org.apache.spark.rdd.rdd.$anonfun$partitions$2(rdd.scala:253) 在scala.Option.getOrElse(Option.scala:138) 位于org.apache.spark.rdd.rdd.partitions(rdd.scala:251) 位于org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) 位于org.apache.spark.rdd.rdd.$anonfun$partitions$2(rdd.scala:253) 在scala.Option.getOrElse(Option.scala:138) 位于org.apache.spark.rdd.rdd.partitions(rdd.scala:251) 位于org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) 在org.apache.spark.rdd.rdd.$anonfun$collect$1(rdd.scala:945) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 位于org.apache.spark.rdd.rdd.withScope(rdd.scala:363) 位于org.apache.spark.rdd.rdd.collect(rdd.scala:944) 位于org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361) 位于org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360) 位于org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
我也不明白上述错误的原因。请有人提供有关如何理解该错误以及如何修复该错误的信息。

阅读下面的链接,其中包含您需要了解的有关问题的所有信息:)

(免责声明:不是Java开发人员。因此,我们将根据Scala的经验尝试回答。)

您可以在此处使用高阶函数-foreach。高阶函数将“序列化”提供给它们的参数,并跨RDD分区发送它们(通常通过网络分布在机器上)。我不确定System.out.println在Java中是否是“可序列化对象”。因此,其中一种方法是在Java中使用Lambda表示法,并将上述代码更改如下:

textRDD.foreach((s)->System.out.println(s))


希望有帮助!:)

在你发布链接之前,我已经看过了,它并没有帮助我解决我所遇到的问题,它起了作用。不知道为什么我没有这个想法,为什么我不能在这里使用方法引用。谢谢你的回答。
textRDD.foreach(System.out::println);
textRDD.collect().forEach(System.out::println);
Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:253)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:253)
    at scala.Option.getOrElse(Option.scala:138)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
    at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
    at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361)
    at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360)
    at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)