Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/386.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
需要帮助加入spark RDD';java中的s_Java_Apache Spark_Spark Cassandra Connector - Fatal编程技术网

需要帮助加入spark RDD';java中的s

需要帮助加入spark RDD';java中的s,java,apache-spark,spark-cassandra-connector,Java,Apache Spark,Spark Cassandra Connector,需要在spark中执行以下连接操作 JavaPairRDD<String, Tuple2<Optional<MarkToMarketPNL>, Optional<MarkToMarketPNL>>> finalMTMPNLRDD = openMTMPNL.fullOuterJoin(closedMTMPNL); 我认为错误不在问题中包含的代码中。Spark正在尝试对RDD运行count。您包含的代码不调用count,因此这是一个符号。但是这个异常

需要在spark中执行以下连接操作

JavaPairRDD<String, Tuple2<Optional<MarkToMarketPNL>, Optional<MarkToMarketPNL>>> finalMTMPNLRDD = openMTMPNL.fullOuterJoin(closedMTMPNL);

我认为错误不在问题中包含的代码中。Spark正在尝试对RDD运行
count
。您包含的代码不调用
count
,因此这是一个符号。但是这个异常表明被计数的RDD有一个用Java创建的迭代器,现在正在转换为Scala迭代器。此时,这个迭代器实际上是
null


您的代码是否在某个地方生成了迭代器?可能在
mapPartitions
调用或类似的调用中?

此异常是由于某个函数返回空值造成的。您可以返回null,然后过滤null元组,例如:

JavaPairRDD<String,MarkToMarketPNL> openMTMPNL = openMTM.keyBy(new Function<MarkToMarketPNL,String>(){
            public String call(MarkToMarketPNL mtm) throws Exception
            {
                    return mtm.getTaxlot();
            }
        }).filter(new Function<Tuple2<String, MarkToMarketPNL>, Boolean>() {

        @Override
        public Boolean call(Tuple2<String, MarkToMarketPNL> arg) throws Exception {
            return arg == null ? false : true;
       }
    }); 
javapairdd openMTMPNL=openMTM.keyBy(新函数(){
公共字符串调用(MarkToMarketPNL mtm)引发异常
{
返回mtm.getTaxlot();
}
}).filter(新函数(){
@凌驾
公共布尔调用(Tuple2 arg)引发异常{
返回arg==null?false:true;
}
}); 

我也面临同样的问题。在内部执行联接操作时,将创建。如果Iterable对象中有一个为null,我们将看到上面所示的null指针异常


在执行联接之前,请确保所有值均不为null

我的第一个猜测是有些MTM是空的。
java.lang.NullPointerException
15/06/28 01:19:30 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.NullPointerException
    at scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:53)
    at scala.collection.IterableLike$class.toIterator(IterableLike.scala:89)
    at scala.collection.AbstractIterable.toIterator(Iterable.scala:54)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1626)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
15/06/28 01:19:30 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException
    at scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:53)
    at scala.collection.IterableLike$class.toIterator(IterableLike.scala:89)
    at scala.collection.AbstractIterable.toIterator(Iterable.scala:54)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1626)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
JavaPairRDD<String,MarkToMarketPNL> openMTMPNL = openMTM.keyBy(new Function<MarkToMarketPNL,String>(){
            public String call(MarkToMarketPNL mtm) throws Exception
            {
                    return mtm.getTaxlot();
            }
        }).filter(new Function<Tuple2<String, MarkToMarketPNL>, Boolean>() {

        @Override
        public Boolean call(Tuple2<String, MarkToMarketPNL> arg) throws Exception {
            return arg == null ? false : true;
       }
    });