Spark kryo_连载器和广播<;地图<;对象,Iterable<;GowallaDataLocation>&燃气轮机&燃气轮机;java.io.IOException:java.lang.UnsupportedOperationException

Spark kryo_连载器和广播<;地图<;对象,Iterable<;GowallaDataLocation>&燃气轮机&燃气轮机;java.io.IOException:java.lang.UnsupportedOperationException,java,apache-spark,kryo,Java,Apache Spark,Kryo,当我尝试访问广播变量时,我收到以下异常: 17/03/26 03:04:23警告TaskSetManager:在阶段3.0中丢失任务0.0(TID 10192.168.56.5,执行器1):java.io.IOException:java.lang.UnsupportedOperationException 位于org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276) 位于org.apache.spark.broadcast.

当我尝试访问广播变量时,我收到以下异常:

17/03/26 03:04:23警告TaskSetManager:在阶段3.0中丢失任务0.0(TID 10192.168.56.5,执行器1):java.io.IOException:java.lang.UnsupportedOperationException 位于org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276) 位于org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206) 在org.apache.spark.broadcast.TorrentBroadcast.\u value$lzycompute(TorrentBroadcast.scala:66) 在org.apache.spark.broadcast.TorrentBroadcast.\u值(TorrentBroadcast.scala:66) 位于org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) 在GowallaTask$2.call(GowallaTask.java:214) 位于org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:351) 位于org.apache.spark.api.java.JavaRDDLike$$anonfun$foreach$1.apply(JavaRDDLike.scala:351) 位于scala.collection.Iterator$class.foreach(Iterator.scala:893) 在org.apache.spark.interruptblediator.foreach(interruptblediator.scala:28) 位于org.apache.spark.rdd.rdd$$anonfun$foreach$1$$anonfun$apply$28.apply(rdd.scala:917) 位于org.apache.spark.rdd.rdd$$anonfun$foreach$1$$anonfun$apply$28.apply(rdd.scala:917) 位于org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) 位于org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) 位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 位于org.apache.spark.scheduler.Task.run(Task.scala:99) 位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:282) 位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 运行(Thread.java:745)

原因:java.lang.UnsupportedOperationException 位于java.util.AbstractMap.put(AbstractMap.java:209) 位于com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:162) 位于com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39) 在com.esotericsoftware.kryo.kryo.readClassAndObject(kryo.java:790)上 位于org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:244) 在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$10.apply上(TorrentBroadcast.scala:286) 位于org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1303) 在org.apache.spark.broadcast.TorrentBroadcast$.unboifyObject上(TorrentBroadcast.scala:287) 在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:221)上 位于org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1269) ... 还有19个

我在使用KryoSerializer时收到了异常

    conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
    conf.set("spark.kryoserializer.buffer.mb", "24");
这是我的密码

JavaPairRDD<Object, Iterable<GowallaDataLocation>> line_RDD_2 = sc
            .textFile("/home/piero/gowalla_location.txt", 2).map(new GowallaMapperDataLocation())
            .groupBy(new Function<GowallaDataLocation, Object>() {

                /**
                 * 
                 */
                private static final long serialVersionUID = -6773509902594100325L;

                @Override
                public Object call(GowallaDataLocation v1) throws Exception {
                    DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd");

                    return dateFormat.format(v1.getDATE());
                }
            }).persist(StorageLevel.MEMORY_AND_DISK_SER());



Broadcast<Map<Object, Iterable<GowallaDataLocation>>> broadcastVar_2 = sc.broadcast(line_RDD_2.collectAsMap());
    //System.out.println(broadcastVar_2.getValue().size());

    JavaRDD<Object> keys = line_RDD_2.keys().persist(StorageLevel.MEMORY_ONLY_SER());
    line_RDD_2.unpersist();

    keys.foreach(new VoidFunction<Object>() {

        /**
         * 
         */
        private static final long serialVersionUID = -8148877518271969523L;

        @Override
        public void call(Object t) throws Exception {
            // TODO Auto-generated method stub
            //System.out.println("KEY:" + t + " ");
            Iterable<GowallaDataLocation> dr = broadcastVar_2.getValue().get(t);

        }

    });
javapairdd行\u RDD\u 2=sc
.textFile(“/home/piero/gowalla_location.txt”,2).map(新的GowallaMapperDataLocation())
.groupBy(新函数(){
/**
* 
*/
私有静态最终长serialVersionUID=-6773509902594100325L;
@凌驾
公共对象调用(GowallaDataLocation v1)引发异常{
DateFormat DateFormat=新的SimpleDateFormat(“yyyyMMdd”);
return dateFormat.format(v1.getDATE());
}
}).persist(StorageLevel.MEMORY_和_DISK_SER());
Broadcast broadcastVar_2=sc.Broadcast(line_RDD_2.collectAsMap());
//System.out.println(broadcastVar_2.getValue().size());
JavaRDD keys=line_RDD_2.keys().persist(StorageLevel.MEMORY_ONLY_SER());
行RDD_2.unpersist();
keys.foreach(新的VoidFunction(){
/**
* 
*/
私有静态最终长serialVersionUID=-8148877518271969523L;
@凌驾
公共无效调用(对象t)引发异常{
//TODO自动生成的方法存根
//System.out.println(“键:+t+”);
Iterable dr=broadcastVar_2.getValue().get(t);
}
});

我怀疑发生这种情况是因为您正在直接广播
行RDD_2.collectAsMap()
:这意味着广播的类型是Map,kryo不知道正确的实现,将使用
抽象映射作为其内部工作

如果我这样做:

Map<String, String> a = new HashMap<String, String>();
a.put("a", "b");
Set<String> c = a.keySet();
c.add("e");
如果我的猜测是正确的,您可能可以这样解决:

Map<Object, Iterable<GowallaDataLocation>> a = new HashMap<>();
a.putAll(line_RDD_2.collectAsMap());
Broadcast<Map<Object, Iterable<GowallaDataLocation>>> broadcastVar_2 = sc.broadcast(a);
Map a=newhashmap();
a、 putAll(line_RDD_2.collectAsMap());
广播Var_2=sc.广播(a);
让我知道这是否有效

Map<Object, Iterable<GowallaDataLocation>> a = new HashMap<>();
a.putAll(line_RDD_2.collectAsMap());
Broadcast<Map<Object, Iterable<GowallaDataLocation>>> broadcastVar_2 = sc.broadcast(a);