Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark ¿;MultilayerPerceptronClassifier-Spark-mllib中的maxIter参数是什么?_Apache Spark_Apache Spark Mllib - Fatal编程技术网

Apache spark ¿;MultilayerPerceptronClassifier-Spark-mllib中的maxIter参数是什么?

Apache spark ¿;MultilayerPerceptronClassifier-Spark-mllib中的maxIter参数是什么?,apache-spark,apache-spark-mllib,Apache Spark,Apache Spark Mllib,?多层接收器分类器-Spark-mllib中的maxIter是什么? 1。参数maxIter告诉优化算法允许进行的最大跳数,以找到最小误差 或 2.参数maxIter表示最大历元数(整个数据集通过网络的最大次数) Spark gradient优化器使用RDD TreeAgregate函数工作。每一次迭代,它取RDD的一小部分,默认为1,并将梯度优化操作分配给工人,每次迭代,它取整个RDD。在这种情况下,一次迭代可视为一个历元。该方法使用Spark简化了优化过程。还有另一种更高级的深度学习优化器实

?多层接收器分类器-Spark-mllib中的maxIter是什么?

1。参数maxIter告诉优化算法允许进行的最大跳数,以找到最小误差

2.参数maxIter表示最大历元数(整个数据集通过网络的最大次数)


Spark gradient优化器使用RDD TreeAgregate函数工作。每一次迭代,它取RDD的一小部分,默认为1,并将梯度优化操作分配给工人,每次迭代,它取整个RDD。在这种情况下,一次迭代可视为一个历元。该方法使用Spark简化了优化过程。还有另一种更高级的深度学习优化器实现,如BigDL,它允许设置批大小,并使用BlockManager为每个迭代计算分布式梯度聚合。在这种情况下,一次迭代对应一次小批量执行。


我回顾了MultilayerPerceptronClassifier类的源代码,发现maxIter参数是梯度计算停止标准之一,而blockSize用于spark mapPartitions方法。非常感谢您对@EmiCareOfCell44的帮助
class pyspark.ml.classification.MultilayerPerceptronClassifier(featuresCol='features', labelCol='label', predictionCol='prediction', maxIter=100, tol=1e-06, seed=None, layers=None, blockSize=128, stepSize=0.03, solver='l-bfgs', initialWeights=None, probabilityCol='probability', rawPredictionCol='rawPrediction')
  /**
   * Aggregates the elements of this RDD in a multi-level tree pattern.
   * This method is semantically identical to [[org.apache.spark.rdd.RDD#aggregate]].
   *
   * @param depth suggested depth of the tree (default: 2)
   */
  def treeAggregate[U: ClassTag](zeroValue: U)(
      seqOp: (U, T) => U,
      combOp: (U, U) => U,
      depth: Int = 2): U = withScope {
    require(depth >= 1, s"Depth must be greater than or equal to 1 but got $depth.")
    if (partitions.length == 0) {
      Utils.clone(zeroValue, context.env.closureSerializer.newInstance())
    } else {
      val cleanSeqOp = context.clean(seqOp)
      val cleanCombOp = context.clean(combOp)
      val aggregatePartition =
        (it: Iterator[T]) => it.aggregate(zeroValue)(cleanSeqOp, cleanCombOp)
      var partiallyAggregated: RDD[U] = mapPartitions(it => Iterator(aggregatePartition(it)))
      var numPartitions = partiallyAggregated.partitions.length
      val scale = math.max(math.ceil(math.pow(numPartitions, 1.0 / depth)).toInt, 2)
      // If creating an extra level doesn't help reduce
      // the wall-clock time, we stop tree aggregation.

      // Don't trigger TreeAggregation when it doesn't save wall-clock time
      while (numPartitions > scale + math.ceil(numPartitions.toDouble / scale)) {
        numPartitions /= scale
        val curNumPartitions = numPartitions
        partiallyAggregated = partiallyAggregated.mapPartitionsWithIndex {
          (i, iter) => iter.map((i % curNumPartitions, _))
        }.foldByKey(zeroValue, new HashPartitioner(curNumPartitions))(cleanCombOp).values
      }
      val copiedZeroValue = Utils.clone(zeroValue, sc.env.closureSerializer.newInstance())
      partiallyAggregated.fold(copiedZeroValue)(cleanCombOp)
    }
  }