Scala apachespark的精度在每次运行时都会有所不同,并且有时会出现运行时异常
这是, 如果有人想复制它。由于某些原因,每次我运行这个程序时,它都会返回不同的值。这实际上是一个我想查看的Github项目。有一部分数据集丢失,但我设法在没有它的情况下运行它。问题是,它有时运行正常,但每次返回的精度不同。有时,我会遇到断言异常和不支持的操作异常。有人知道为什么会这样吗 我使用逻辑回归和随机森林在单独的Spark MLlib管道上进行装袋。它运行良好,但每次都返回不同的精度级别和混淆矩阵。有时它会抛出异常,下面给出了stacktraceScala apachespark的精度在每次运行时都会有所不同,并且有时会出现运行时异常,scala,apache-spark,apache-spark-mllib,random-forest,logistic-regression,Scala,Apache Spark,Apache Spark Mllib,Random Forest,Logistic Regression,这是, 如果有人想复制它。由于某些原因,每次我运行这个程序时,它都会返回不同的值。这实际上是一个我想查看的Github项目。有一部分数据集丢失,但我设法在没有它的情况下运行它。问题是,它有时运行正常,但每次返回的精度不同。有时,我会遇到断言异常和不支持的操作异常。有人知道为什么会这样吗 我使用逻辑回归和随机森林在单独的Spark MLlib管道上进行装袋。它运行良好,但每次都返回不同的精度级别和混淆矩阵。有时它会抛出异常,下面给出了stacktrace Dataset size: 2
Dataset size: 2
B- Sample size: 5
17/04/28 13:19:43 INFO LBFGS: Step Size: 0.7559
17/04/28 13:19:43 INFO LBFGS: Val and Grad Norm: 0.143776 (rel: 0.793) 0.160762
17/04/28 13:19:43 INFO LBFGS: Step Size: 1.000
17/04/28 13:19:43 INFO LBFGS: Val and Grad Norm: 0.127285 (rel: 0.115) 0.0815899
17/04/28 13:19:44 INFO LBFGS: Step Size: 1.000
17/04/28 13:19:44 INFO LBFGS: Val and Grad Norm: 0.120321 (rel: 0.0547) 0.0207179
17/04/28 13:19:45 INFO LBFGS: Step Size: 1.000
17/04/28 13:19:45 INFO LBFGS: Val and Grad Norm: 0.119759 (rel: 0.00467) 0.00553480
17/04/28 13:19:46 INFO LBFGS: Step Size: 1.000
17/04/28 13:19:46 INFO LBFGS: Val and Grad Norm: 0.119721 (rel: 0.000315) 0.00214368
17/04/28 13:19:46 INFO LBFGS: Step Size: 1.000
17/04/28 13:19:46 INFO LBFGS: Val and Grad Norm: 0.119716 (rel: 3.85e-05) 0.000959314
17/04/28 13:19:47 INFO LBFGS: Step Size: 1.000
17/04/28 13:19:47 INFO LBFGS: Val and Grad Norm: 0.119715 (rel: 9.22e-06) 0.000185495
17/04/28 13:19:47 INFO LBFGS: Step Size: 1.000
17/04/28 13:19:47 INFO LBFGS: Val and Grad Norm: 0.119715 (rel: 4.08e-07) 2.80789e-05
17/04/28 13:19:48 INFO LBFGS: Step Size: 1.000
17/04/28 13:19:48 INFO LBFGS: Val and Grad Norm: 0.119715 (rel: 1.01e-08) 1.58237e-06
Dataset size: 2
B- Sample size: 0
Exception in thread "main" java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD.first(RDD.scala:1191)
at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:167)
at org.apache.spark.ml.classification.BaggedLogisticRegression$$anonfun$train$1.apply(BaggedLogisticRegression.scala:123)
at org.apache.spark.ml.classification.BaggedLogisticRegression$$anonfun$train$1.apply(BaggedLogisticRegression.scala:99)
at scala.collection.immutable.Range.foreach(Range.scala:141)
at org.apache.spark.ml.classification.BaggedLogisticRegression.train(BaggedLogisticRegression.scala:99)
at org.apache.spark.ml.classification.BaggedLogisticRegression.train(BaggedLogisticRegression.scala:63)
at org.apache.spark.ml.impl.estimator.Predictor.fit(Predictor.scala:102)
at org.apache.spark.ml.impl.estimator.Predictor.fit(Predictor.scala:82)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:118)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:114)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:42)
at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:43)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:114)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:79)
at org.apache.spark.ml.Estimator$$anonfun$fit$1.apply(Estimator.scala:68)
at org.apache.spark.ml.Estimator$$anonfun$fit$1.apply(Estimator.scala:68)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.ml.Estimator.fit(Estimator.scala:68)
at org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:110)
at org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:105)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.ml.tuning.CrossValidator.fit(CrossValidator.scala:105)
at org.apache.spark.ml.tuning.CrossValidator.fit(CrossValidator.scala:78)
at org.apache.spark.ml.Estimator.fit(Estimator.scala:44)
at com.arvind.majorproject.CrossValidation.CrossValidation$.crossValidate(CrossValidation.scala:124)
at com.arvind.majorproject.main.Main$.main(Main.scala:140)
at com.arvind.majorproject.main.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Run ERROR: Aborting.
请你提供更多的信息。如果您更关心异常、集群的详细信息以及堆栈跟踪和导致异常的代码。如果您更关心精度的变化,请提供精度变化的示例?我认为你有两个独立的问题需要不同的技能来解决,所以你最好问两个独立的问题。更新为stacktrace。请查收。