Scala 数据帧过滤器问题,怎么办?

Scala 数据帧过滤器问题,怎么办?,scala,apache-spark,filter,spark-dataframe,Scala,Apache Spark,Filter,Spark Dataframe,环境:Spark 1.6,Scala 我的数据框像贝娄 DF= DT col1 col2 -----|--|-- 2017011011 | AA | BB 2017011011 | CC | DD 2017011015 | PP | BB 2017011015 | QQ | DD 2017011016 | AA | BB 2017011016 | CC | DD 2017011017 | PP | BB 2017011017 | QQ | DD 如何进行过滤以获得类似SQL-select*fro

环境:Spark 1.6,Scala

我的数据框像贝娄

DF= DT col1 col2 -----|--|-- 2017011011 | AA | BB 2017011011 | CC | DD 2017011015 | PP | BB 2017011015 | QQ | DD 2017011016 | AA | BB 2017011016 | CC | DD 2017011017 | PP | BB 2017011017 | QQ | DD

如何进行过滤以获得类似SQL-select*from DF的结果,其中dt>select distinct dt from DF order by dt desc limit 3

输出有最后3个日期

2017011015 | PP | BB 2017011015 | QQ | DD 2017011016 | AA | BB 2017011016 | CC | DD 2017011017 | PP | BB 2017011017 | QQ | DD

谢谢 Hossain在Spark 1.6.1上测试

import sqlContext.implicit._
val df = sqlContext.createDataFrame(Seq(
  (2017011011, "AA", "BB"),
  (2017011011, "CC", "DD"),
  (2017011015, "PP", "BB"),
  (2017011015, "QQ", "DD"),
  (2017011016, "AA", "BB"),
  (2017011016, "CC", "DD"),
  (2017011017, "PP", "BB"),
  (2017011017, "QQ", "DD")
)).select(
  $"_1".as("DT"),
  $"_2".as("col1"),
  $"_3".as("col2")
) 

val dates = df.select($"DT")
  .distinct()
  .orderBy(-$"DT")
  .map(_.getInt(0))
  .take(3)

val result = df.filter(dates.map($"DT" === _).reduce(_ || _))
result.show()