Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/maven/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在下面的scala代码中,不打印所需的字符串_Scala_Apache Spark - Fatal编程技术网

在下面的scala代码中,不打印所需的字符串

在下面的scala代码中,不打印所需的字符串,scala,apache-spark,Scala,Apache Spark,为什么下面代码中的累加器变量不打印聚合字符串 object mapRDD { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .master("local") .appName("sparkSessionName") .getOrCreate() spark.sparkContext.setLo

为什么下面代码中的累加器变量不打印聚合字符串


object mapRDD {
  def main(args: Array[String]): Unit = {

    val spark = SparkSession.builder()
      .master("local")
      .appName("sparkSessionName")
      .getOrCreate()

    spark.sparkContext.setLogLevel("WARN")

    val data = Seq("Project",
      "Gutenberg’s",
      "Alice’s",
      "Adventures",
      "in",
      "Wonderland")

    val rdd = spark.sparkContext.parallelize(data)

    var accumulator: String = "WHY IS THE AGGREGATED STRING NOT PRINTED?"
    for (eachElementOfRDD <- rdd) {
      accumulator = accumulator ++ eachElementOfRDD
    }
    println(accumulator)
  }
}

对象映射{
def main(参数:数组[字符串]):单位={
val spark=SparkSession.builder()
.master(“本地”)
.appName(“sparkSessionName”)
.getOrCreate()
spark.sparkContext.setLogLevel(“警告”)
val数据=序号(“项目”,
“古腾堡”,
“爱丽丝的”,
“冒险”,
“在”,
“仙境”)
val rdd=spark.sparkContext.parallelize(数据)
var acculator:String=“为什么不打印聚合字符串?”

对于(eachelementofordd代码中有两件事需要更改以获得预期的输出:

  • 使用创建累加器
  • 使用
val acculator=spark.sparkContext.collectionacculator[String]
foreach(eachElementOfRDD=>acculator.add(eachElementOfRDD))
println(累加器值)
输出:

[古腾堡、爱丽丝、奇遇、奇境、奇境项目]

我担心的是用于(eachElementOfRDD@mannat)的代码请看一看。这有帮助吗?