Scala 数据帧过滤器问题,怎么办?
环境:Spark 1.6,Scala 我的数据框像贝娄 DF= DT col1 col2 -----|--|-- 2017011011 | AA | BB 2017011011 | CC | DD 2017011015 | PP | BB 2017011015 | QQ | DD 2017011016 | AA | BB 2017011016 | CC | DD 2017011017 | PP | BB 2017011017 | QQ | DD 如何进行过滤以获得类似SQL-select*from DF的结果,其中dt>select distinct dt from DF order by dt desc limit 3 输出有最后3个日期 2017011015 | PP | BB 2017011015 | QQ | DD 2017011016 | AA | BB 2017011016 | CC | DD 2017011017 | PP | BB 2017011017 | QQ | DD 谢谢 Hossain在Spark 1.6.1上测试Scala 数据帧过滤器问题,怎么办?,scala,apache-spark,filter,spark-dataframe,Scala,Apache Spark,Filter,Spark Dataframe,环境:Spark 1.6,Scala 我的数据框像贝娄 DF= DT col1 col2 -----|--|-- 2017011011 | AA | BB 2017011011 | CC | DD 2017011015 | PP | BB 2017011015 | QQ | DD 2017011016 | AA | BB 2017011016 | CC | DD 2017011017 | PP | BB 2017011017 | QQ | DD 如何进行过滤以获得类似SQL-select*fro
import sqlContext.implicit._
val df = sqlContext.createDataFrame(Seq(
(2017011011, "AA", "BB"),
(2017011011, "CC", "DD"),
(2017011015, "PP", "BB"),
(2017011015, "QQ", "DD"),
(2017011016, "AA", "BB"),
(2017011016, "CC", "DD"),
(2017011017, "PP", "BB"),
(2017011017, "QQ", "DD")
)).select(
$"_1".as("DT"),
$"_2".as("col1"),
$"_3".as("col2")
)
val dates = df.select($"DT")
.distinct()
.orderBy(-$"DT")
.map(_.getInt(0))
.take(3)
val result = df.filter(dates.map($"DT" === _).reduce(_ || _))
result.show()