Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何使用RDD.wholeTextFiles计算文件中的字符数?_Apache Spark - Fatal编程技术网

Apache spark 如何使用RDD.wholeTextFiles计算文件中的字符数?

Apache spark 如何使用RDD.wholeTextFiles计算文件中的字符数?,apache-spark,Apache Spark,我创建了一个RDD,如下所示: val manylines = sc.wholeTextFiles("c:\\spark\\*.txt") scala> manylines res23: org.apache.spark.rdd.RDD[(String, String)] = d:\spark\*.txt MapPartitionsRDD[1] at wholeTextFiles at <console>:24 val manylines=sc.wholeTextFiles(

我创建了一个RDD,如下所示:

val manylines = sc.wholeTextFiles("c:\\spark\\*.txt")
scala> manylines
res23: org.apache.spark.rdd.RDD[(String, String)] = d:\spark\*.txt
MapPartitionsRDD[1] at wholeTextFiles at <console>:24
val manylines=sc.wholeTextFiles(“c:\\spark\\\*.txt”)
scala>Manyline
res23:org.apache.spark.rdd.rdd[(字符串,字符串)]=d:\spark\*.txt
wholeTextFiles处的MapPartitionsRDD[1]位于:24

如何计算每个文件中每行的字符数?

如果要处理内容,请使用
mapValues
而不是
map

manylines.mapValues(_.length)
但如果你想处理生产线,你就必须更深入,例如

manylines.flatMapValues(_.split("\n")).
  mapValues(_.length)

如果您对显示文件中的字符总数感兴趣,则可以将每行映射到其长度,然后使用到DoubleRDD函数的隐式转换来调用
sum()

manylines.map{ case(key, value) => value }.map(_.length).sum