Apache spark 来自RDD的火花过滤广播变量
我正在学习广播变量,并尝试从RDD中过滤这些变量。这对我来说是不可能的 这是我的样本数据 content.txtApache spark 来自RDD的火花过滤广播变量,apache-spark,Apache Spark,我正在学习广播变量,并尝试从RDD中过滤这些变量。这对我来说是不可能的 这是我的样本数据 content.txt Hello this is Rogers.com This is Bell.com Apache Spark Training This is Spark Learning Session Spark is faster than MapReduce Hello, is, this, the scala> val content = sc.textFile("FilterC
Hello this is Rogers.com
This is Bell.com
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce
Hello, is, this, the
scala> val content = sc.textFile("FilterCount/Content.txt")
scala> val contentRDD = content.flatMap(x => x.split(","))
scala> val remove = sc.textFile("FilterCount/Remove.txt")
scala> val removeRDD = remove.flatMap(x => x.split(",")).map(w => w.trim)
scala> val bRemove = sc.broadcast(removeRDD.collect().toList)
scala> val filtered = contentRDD.filter{case (word) => !bRemove.value.contains(word)}
scala> filtered.foreach(print)
remove.txt
Hello this is Rogers.com
This is Bell.com
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce
Hello, is, this, the
scala> val content = sc.textFile("FilterCount/Content.txt")
scala> val contentRDD = content.flatMap(x => x.split(","))
scala> val remove = sc.textFile("FilterCount/Remove.txt")
scala> val removeRDD = remove.flatMap(x => x.split(",")).map(w => w.trim)
scala> val bRemove = sc.broadcast(removeRDD.collect().toList)
scala> val filtered = contentRDD.filter{case (word) => !bRemove.value.contains(word)}
scala> filtered.foreach(print)
脚本
Hello this is Rogers.com
This is Bell.com
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce
Hello, is, this, the
scala> val content = sc.textFile("FilterCount/Content.txt")
scala> val contentRDD = content.flatMap(x => x.split(","))
scala> val remove = sc.textFile("FilterCount/Remove.txt")
scala> val removeRDD = remove.flatMap(x => x.split(",")).map(w => w.trim)
scala> val bRemove = sc.broadcast(removeRDD.collect().toList)
scala> val filtered = contentRDD.filter{case (word) => !bRemove.value.contains(word)}
scala> filtered.foreach(print)
你好,我是罗杰斯,我是贝尔,我是阿帕奇星火培训公司
Spark学习课程Park比MapReduce更快
如上所示,筛选列表仍然包含广播变量。我怎样才能去掉这些 这是因为您正在用“
”、“
”拆分文件,但您的文件用空格分隔
替换为
scala> val content = sc.textFile("FilterCount/Content.txt")
scala> val contentRDD = content.flatMap(x => x.split(" "))
使用此选项可以忽略大小写
val filtered = contentRDD.filter{case (word) =>
!bRemove.value.map(_.toLowerCase).contains(word.toLowerCase()
)}
希望这能奏效 你确定你的支票“Hello”包含(“Hello this is Rogers.com”)正确吗?谢谢!这是一个典型的错误,我花了几个小时也没能抓住,因为我没找准地方+1用于显示忽略情况。