Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/reactjs/22.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 基于时间戳的过滤rdd_Scala_Apache Spark_Rdd_Spark Cassandra Connector - Fatal编程技术网

Scala 基于时间戳的过滤rdd

Scala 基于时间戳的过滤rdd,scala,apache-spark,rdd,spark-cassandra-connector,Scala,Apache Spark,Rdd,Spark Cassandra Connector,我有以下代码:- val imei = "86656" val date = "2017-04-09" val gpsdt = "2017-04-09 00:20:10" val rdd = sc.cassandraTable("test", "xyz").select("id", "date", "dttime").where("id=? and date=?", imei, date) 所以,现在我有了rdd,它带来了特定日期的特定imei的全部数据,但我想根据提到的gpsdt筛选行,得到

我有以下代码:-

val imei = "86656"
val date = "2017-04-09"
val gpsdt = "2017-04-09 00:20:10"
val rdd = sc.cassandraTable("test", "xyz").select("id", "date", "dttime").where("id=? and date=?", imei, date)
所以,现在我有了rdd,它带来了特定日期的特定imei的全部数据,但我想根据提到的gpsdt筛选行,得到2行-1行刚好大于给定时间,第2行刚好小于提到的行?我怎样才能做到这一点

我的Cassandra DB模式是:-

create table xyz( id text,date text, dttime timestamp,roll text, primary key((id,date),dttime)
谢谢,

您可以将rdd分为两部分:

1当dttime大于gpsdt时,按dttime升序排序,取第一个

2当dttime小于gpsdt时,按dttime降序排列,取第一个

最后,将它们合并,就应该得到所需的行

编程的


感谢@Alex Ott指出

cassandraTable返回RDD,而不是DataFrame!请参阅@Ramesh,我收到错误:-value$不是StringContext的成员&导入后,我的错误消息更改为:-类型不匹配;找到:org.apache.spark.sql.Column必需:com.datastax.spark.connector.CassandraRow⇒ Boolean@RameshMaharjan-编辑您的答案后,我知道步骤,但我需要一些可以使用的查询或命令,因为我无法在RDD上找到大于或小于。如果你能提供我的查询,这将是有益的。谢谢,出现错误:-value getAs不是com.datastax.spark.connector.CassandraRow的成员
val justGreater = rdd.filter(row => row.get[String]("dttime") > gpsdt).sortBy(row => row.get[String]("dttime")).take(1)
val justLess = rdd.filter(row => row.get[String]("dttime") < gpsdt).sortBy(row => row.get[String]("dttime"), false).take(1)
justGreater.union(justLess)