Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark MLlib:java.lang.UnknownError:找不到连续变量的bin_Apache Spark_Apache Spark Mllib - Fatal编程技术网

Apache spark MLlib:java.lang.UnknownError:找不到连续变量的bin

Apache spark MLlib:java.lang.UnknownError:找不到连续变量的bin,apache-spark,apache-spark-mllib,Apache Spark,Apache Spark Mllib,我正在使用决策树算法,我得到以下错误。我有500多种功能。这是个问题吗?任何帮助都会很好 java.lang.UnknownError: no bin was found for continuous variable. at org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:492) at org.apache.spark.mllib.tree.DecisionTree$.org$apa

我正在使用决策树算法,我得到以下错误。我有500多种功能。这是个问题吗?任何帮助都会很好

java.lang.UnknownError: no bin was found for continuous variable.
    at org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:492)
    at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529)
    at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
    at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
    at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
    at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
    at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
    at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
    at org.apache.spark.scheduler.Task.run(Task.scala:51)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
14/08/13 16:36:06 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.UnknownError: no bin was found for continuous variable.
    at org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:492)
    at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529)
    at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
    at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
    at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
    at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
    at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
    at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
    at org.apache.spark.scheduler.Task.run(Task.scala:51)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

这确实是由于输入数据不干净造成的。很少有行具有列的“NaN”条目。一旦我们清理干净,一切都很好