Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/maven/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 在(string,List(string))RDD元组执行take方法时取消了Spark作业_Java_Scala_Apache Spark_Rdd - Fatal编程技术网

Java 在(string,List(string))RDD元组执行take方法时取消了Spark作业

Java 在(string,List(string))RDD元组执行take方法时取消了Spark作业,java,scala,apache-spark,rdd,Java,Scala,Apache Spark,Rdd,我的Spark作业生成一个Rdd元组,如(string,List(string)),当我保存所有数据时,它运行正常,但当我使用方法take(100)保存数据时,它不起作用,有时它返回100个键但没有值。有人能帮我解决这个问题吗?谢谢。 存在火花错误日志: User class threw exception: org.apache.spark.SparkException: Job 1 cancelled as part of cancellation of all jobs at org.ap

我的Spark作业生成一个Rdd元组,如(string,List(string)),当我保存所有数据时,它运行正常,但当我使用方法take(100)保存数据时,它不起作用,有时它返回100个键但没有值。有人能帮我解决这个问题吗?谢谢。 存在火花错误日志:

User class threw exception: org.apache.spark.SparkException: Job 1 cancelled as part of cancellation of all jobs
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1515)
at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1443)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:735)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:735)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:735)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:735)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onError(DAGScheduler.scala:1749)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:52)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:641)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1957)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1970)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1990)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1154)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1095)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1069)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1035)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1035)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1035)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:961)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:961)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:961)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:960)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1522)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1501)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1501)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1501)

Hi-Cisol,请共享生成此错误的代码。itemRdd.map{item=>val ids=item.\u 2.map(x=>x.substring(0,x.indexOf(“:”))(item.\u 1,ids)}.take(100)sparkContext.makeRDD(itemRdd)。重新分区(1)。itemRdd的saveAsTextFile(“/test”)类型为RDD[(string,string)],例如:[("23", "1:1,2:2,3:3")]