对于Scala Spark中的日期-时间列,按时间戳排序不起作用
这是我的数据框对于Scala Spark中的日期-时间列,按时间戳排序不起作用,scala,apache-spark-sql,unix-timestamp,Scala,Apache Spark Sql,Unix Timestamp,这是我的数据框 +-------------+-------------------------+--------------+--------+---------+--------------------+------------------+----------------+----------------------------------+--------------------+-----------------------+-----------------------+------
+-------------+-------------------------+--------------+--------+---------+--------------------+------------------+----------------+----------------------------------+--------------------+-----------------------+-----------------------+-----------+-----------------------------------+--------------------------------+------------------------------+------------+
|数据分区|时间戳|组织ID |源ID |审核ID |审核编号ID |审核编号代码|审核编号|审核编号|审核编号|审核编号|审核编号|审核编号|内部控制SID | isplayingauditorole isplayingcsrauditorole | IsPlayingTaxAdvisorRole | FFAction |||AuditorPionionInternalControlCode | AuditorPinionOnGoingConcernCode | AuditorPinionGoingConcernId |待过滤|
+-------------+-------------------------+--------------+--------+---------+--------------------+------------------+----------------+----------------------------------+--------------------+-----------------------+-----------------------+-----------+-----------------------------------+--------------------------------+------------------------------+------------+
|日本| 2018-04-04T09:53:35+00:00 | 4295877275 | 181 | 3185 | 3023399 | UNQ | 3010546 |真|假|假| O ||空|空|空| O |!||
|日本| 2018-04-04T08:36:57+00:00 | 4295877275 | 189 | 3185 | 3023399 | UNQ | 3010546 |真|假|假| O ||空|空|空| O |!||
|日本| 2018-04-04T08:39:19+00:00 | 4295877275 | 173 | 3185 | 3023399 | UNQ | 3010546 |真|假|假| O ||空|空|空| O |!||
|日本| 2018-04-04T08:24:17+00:00 | 4295877275 | 196 | 5913 | 3026579 | UWE | 3010547 |空|真|假|假|我||空|空|空|我|!||
|日本| 2018-04-04T08:24:17+00:00 | 4295877275 | 196 | 3185 | 3023399 | UNQ | 3010546 |真|假|假|我||空|空|空|我|!||
|日本| 2018-04-04T09:53:35+00:00 | 4295877275 | 196 | null | null | null | null | null | null | null | D ||空|空|空|我|!||
+-------------+-------------------------+--------------+--------+---------+--------------------+------------------+----------------+----------------------------------+--------------------+-----------------------+-----------------------+-----------+-----------------------------------+--------------------------------+------------------------------+------------+
这是我根据两个专栏获取最新信息的目的:
val windowSpec3=Window.partitionBy(“OrganizationID”、“SourceID”).orderBy(unix_时间戳($“timestamp”、“yyyy-MM-dd HH:MM:ss.SSS”).cast(“时间戳”).desc)
val latestForEachKey3=latestForEachKey.withColumn(“rank”,row_number.over(windowSpec3)).filter($“rank”==1.drop(“rank”).drop(“tobefilted”,“TimeStamp”)
latestForEachKey3.show(false)
这给了我以下的输出
+-------------+--------------+--------+---------+--------------------+------------------+----------------+----------------------------------+--------------------+-----------------------+-----------------------+-----------+-----------------------------------+--------------------------------+------------------------------+
|数据分区|组织ID |源ID |审核ID |审核编号ID |审核编号代码|审核编号ID |审核编号|审核编号|内部控制SID | isplayingauditorole | IsPlayingCSRAuditorRole | IsPlayingTaxAdvisorRole | FFAction |||AuditorOpinionOnInternalControlCode | AuditorOpinionOnGoingConcernCode | AuditoropinionGoingConcernId|
+-------------+--------------+--------+---------+--------------------+------------------+----------------+----------------------------------+--------------------+-----------------------+-----------------------+-----------+-----------------------------------+--------------------------------+------------------------------+
|日本| 4295877275 | 181 | 3185 | 3023399 | UNQ | 3010546 | 3010546 |真|假|假| O ||空|空|空|
|日本| 4295877275 | 189 | 3185 | 3023399 | UNQ | 3010546 | 3010546 |真|假|假| O ||空|空|空|
|日本| 4295877275 | 173 | 3185 | 3023399 | UNQ | 3010546 | 3010546 | true | f
2018-04-04T09:53:35+00:00|4295877275 |196 |null |null
"yyyy-MM-dd HH:mm:ss.SSS"
"yyyy-MM-dd'T'HH:mm:ss"