Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/sql-server-2005/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/joomla/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Apache Spark,NameError:未定义名称“flatMap”_Apache Spark - Fatal编程技术网

Apache spark Apache Spark,NameError:未定义名称“flatMap”

Apache spark Apache Spark,NameError:未定义名称“flatMap”,apache-spark,Apache Spark,当我尝试 tokens = cleaned_book(flatMap(normalize_tokenize)) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'flatMap' is not defined 及 在另一边 sc.parallelize([3,4,5]).flatMap(lambda x: range(1,x)).collec

当我尝试

tokens = cleaned_book(flatMap(normalize_tokenize))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'flatMap' is not defined

在另一边

sc.parallelize([3,4,5]).flatMap(lambda x: range(1,x)).collect()
在同一个Pypark外壳上运行良好

[1, 2, 1, 2, 3, 1, 2, 3, 4]

为什么我会有NameError?

好的,下面是一个带有标记器的Scala示例,它让我觉得您看错了它

def tokenize(f: RDD[String]) = {
      f.map(_.split(" "))
}

val dfsFilename = "/FileStore/tables/some.txt"
val readFileRDD = spark.sparkContext.textFile(dfsFilename)
val wcounts = tokenize(spark.sparkContext.textFile(dfsFilename)).flatMap(x => x).map(word=>(word, 1)).reduceByKey(_ + _)
wcounts.collect()
这很好,你需要功能性的。因此,在这个序列中.flatMap和。 我发现内联方法更简单,但我注意到评论中也提到了.flatMap。

您是否尝试过cleaned\u book.flatMapnormalize\u tokenize?
[1, 2, 1, 2, 3, 1, 2, 3, 4]
def tokenize(f: RDD[String]) = {
      f.map(_.split(" "))
}

val dfsFilename = "/FileStore/tables/some.txt"
val readFileRDD = spark.sparkContext.textFile(dfsFilename)
val wcounts = tokenize(spark.sparkContext.textFile(dfsFilename)).flatMap(x => x).map(word=>(word, 1)).reduceByKey(_ + _)
wcounts.collect()