Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 在特定日期后过滤_Scala_Apache Spark - Fatal编程技术网

Scala 在特定日期后过滤

Scala 在特定日期后过滤,scala,apache-spark,Scala,Apache Spark,我试图在spark中的特定日期后进行筛选我有以下RDD,我有一个由2个字符串组成的数组第一个是日期,下一个是路径,我想检查在特定日期后更改了哪些路径: val cleanRDD = oivRDD.map(x => (x(5), x(7))) res16:数组[(字符串,字符串)]= 排列( (2015-06-24,/),(2015-07-17,/cdh),(2015-06-26,/datameer), (2015-06-24,/devl),(2015-08-11,/dqa),(2015-

我试图在spark中的特定日期后进行筛选我有以下RDD,我有一个由2个字符串组成的数组第一个是日期,下一个是路径,我想检查在特定日期后更改了哪些路径:

val cleanRDD = oivRDD.map(x => (x(5), x(7)))
res16:数组[(字符串,字符串)]= 排列( (2015-06-24,/),(2015-07-17,/cdh),(2015-06-26,/datameer), (2015-06-24,/devl),(2015-08-11,/dqa),(2015-03-12,/lake), (2015-02-13,/osa))

我正在使用Java的SimpleDateFormt:

val sampleDate = new SimpleDateFormat("yyyy-MM-dd")
val filterRDD = cleanRDD.filter(x => dateCompare(x))
我的日期比较:

  def dateCompare(input:(String, String)): Boolean = {
    val date1 = sampleDate.format(input._1)
    val date2 = sampleDate.parse(date1)
    val date3 = sampleDate.parse("2015-07-01")
    if (date2.compareTo(date3) > 0)  true
    else
      false
  }
我得到以下错误:

15/08/12 10:21:16警告TaskSetManager:在阶段7.0(TID)中丢失任务0.0 10,edhpdn2128.kdc.capitalone.com): java.lang.IllegalArgumentException:无法将给定对象格式化为 日期


对于新的dataframe框架,它是一个有效的表达式,如:

dfLogging.filter(dfLogging("when") >= "2015-01-01")
该列具有时间戳类型:

scala> dfLogging.printSchema()
root
 |-- id: long (nullable = true)
 |-- when: timestamp (nullable = true)
 |-- ...

此语法对Scala有效,但对Java和Pyhton应该类似:
sampleDate.format(input.\u 1)
删除此行应该可以工作。噢,哇,简单的错误修复了它!