Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/354.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java spark映射方法引发序列化异常_Java_Hadoop_Serialization_Apache Spark - Fatal编程技术网

Java spark映射方法引发序列化异常

Java spark映射方法引发序列化异常,java,hadoop,serialization,apache-spark,Java,Hadoop,Serialization,Apache Spark,我是Spark新手,我在map函数中遇到了序列化问题。下面是一些代码元素 private Function<Row, String> SparkMap() throws IOException { return new Function<Row, String>() { public String call(Row row) throws IOException { /* some code */

我是
Spark
新手,我在
map
函数中遇到了序列化问题。下面是一些代码元素

private Function<Row, String> SparkMap() throws IOException {
        return new Function<Row, String>() {
            public String call(Row row) throws IOException {
                /* some code */
            }
        };
    }

public static void main(String[] args) throws Exception {
        MyClass myClass = new MyClass();
        SQLContext sqlContext = new SQLContext(sc);
        DataFrame df = sqlContext.load(args[0], "com.databricks.spark.avro");

        JavaRDD<String> output = df.javaRDD().map(myClass.SparkMap());
    }
private函数SparkMap()引发IOException{
返回新函数(){
公共字符串调用(行)引发IOException{
/*一些代码*/
}
};
}
公共静态void main(字符串[]args)引发异常{
MyClass MyClass=新的MyClass();
SQLContext SQLContext=新的SQLContext(sc);
DataFrame df=sqlContext.load(args[0],“com.databricks.spark.avro”);
JavaRDD output=df.JavaRDD().map(myClass.SparkMap());
}
这是错误日志

Caused by: java.io.NotSerializableException: myPackage.MyClass
Serialization stack:
    - object not serializable (class: myPackage.MyClass, value: myPackage.MyClass@281c8380)
    - field (class: myPackage.MyClass$1, name: this$0, type: class myPackage.MyClass)
    - object (class myPackage.MyClass$1, myPackage.MyClass$1@28ef1bc8)
    - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
    - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
    ... 12 more
原因:java.io.NotSerializableException:myPackage.MyClass
序列化堆栈:
-对象不可序列化(类:myPackage.MyClass,值:myPackage)。MyClass@281c8380)
-字段(类:myPackage.MyClass$1,名称:this$0,类型:class myPackage.MyClass)
-对象(类myPackage.MyClass$1,myPackage.MyClass$1@28ef1bc8)
-字段(类:org.apache.spark.api.java.javapairdd$$anonfun$toScalaFunction$1,名称:fun$1,类型:interface org.apache.spark.api.java.function.function)
-对象(类org.apache.spark.api.java.javapairdd$$anonfun$toScalaFunction$1,)
位于org.apache.spark.serializer.SerializationDebugger$.ImproveeException(SerializationDebugger.scala:40)
位于org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
位于org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
... 还有12个

如果我声明
SparkMap
方法是静态的,那么它将运行。这怎么可能呢?

例外情况很容易解释:

object not serializable (class: myPackage.MyClass, value: myPackage.MyClass@281c8380)
只需将您的
MyClas
s
Serializable
设置为可序列化即可

它是静态的,因为在这种情况下它只接受函数,而不是整个
myClass
对象