Mapreduce 在Spark中，filter函数是否将数据转换为元组？_Mapreduce_Apache Spark_Cloud

Mapreduce 在Spark中，filter函数是否将数据转换为元组？

mapreduce apache-spark cloud

Mapreduce 在Spark中，filter函数是否将数据转换为元组？,mapreduce,apache-spark,cloud,Mapreduce,Apache Spark,Cloud,只是想知道过滤器是否将数据转换为元组？比如说 val filesLines = sc.textFile("file.txt") val split_lines = filesLines.map(_.split(";")) val filteredData = split_lines.filter(x => x(4)=="Blue") //从这里开始，如果我们想映射数据，它会使用元组格式，即x.\u 3或x（3）或 filter不会更改RDD-过滤后的数据仍然是RDD（数组[字符串]）

只是想知道过滤器是否将数据转换为元组？比如说

val filesLines = sc.textFile("file.txt")
val split_lines = filesLines.map(_.split(";"))

val filteredData = split_lines.filter(x => x(4)=="Blue")

//从这里开始，如果我们想映射数据，它会使用元组格式，即x.\u 3或x（3）

或

filter不会更改RDD-过滤后的数据仍然是RDD（数组[字符串]）

否，filter所做的一切就是获取一个谓词函数并使用它，使集合中的任何数据点在通过该谓词时返回false，然后它们不会传递回结果集。因此，数据是相同的：

filesLines //RDD[String] (lines of the file)
split_lines //RDD[Array[String]] (lines delimited by semicolon)
filteredData //RDD[Array[String]] (lines delimited by semicolon where the 5th item is Blue

因此，要使用

filteredData

，您必须使用带有适当索引的括号作为数组访问数据

val blueRecords = filteredData.map(x => x(0), x(1))

filesLines //RDD[String] (lines of the file)
split_lines //RDD[Array[String]] (lines delimited by semicolon)
filteredData //RDD[Array[String]] (lines delimited by semicolon where the 5th item is Blue