Apache spark 如何解析日期？_Apache Spark_Apache Spark Sql

Apache spark 如何解析日期？

apache-spark

Apache spark 如何解析日期？,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我的日期格式是： 2006-04-01 01:00:00.000 +0200 但是我需要：2006-04-01 无法从UNIX时间戳格式识别 valid_wdf .withColumn("MYDateOnly", to_date(from_unixtime(unix_timestamp("Formatted Date","yyyy-MM-dd")))) .show() 而且它说的是这样的： org.apache.spark.

我的日期格式是：

2006-04-01 01:00:00.000 +0200

但是我需要：

2006-04-01

无法从UNIX时间戳格式识别

valid_wdf
  .withColumn("MYDateOnly", to_date(from_unixtime(unix_timestamp("Formatted Date","yyyy-MM-dd"))))
  .show()

而且它说的是这样的：

org.apache.spark.SparkUpgradeException：您可能会得到不同的结果由于Spark 3.0的升级：无法解析“2006-04-01” 新解析器中的00:00:00.000+0200'。你可以设置 spark.sql.legacy.timeParserPolicy到legacy以恢复行为在Spark 3.0之前，或设置为已纠正，并将其视为无效日期时间字符串

我想知道为什么要使用这个库。如果有任何解释，我们将不胜感激。

让我们使用Spark 3.0.1中的以下查询并查看异常

Seq("2006-04-01 01:00:00.000 +0200")
  .toDF("d")
  .select(unix_timestamp($"d","yyyy-MM-dd"))
  .show

异常确实说明了SparkUpgradeException的原因，但您必须查看堆栈跟踪的底部，其中显示：

org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2006-04-01 01:00:00.000 +0200' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
  at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:150)
...
Caused by: java.time.format.DateTimeParseException: Text '2006-04-01 01:00:00.000 +0200' could not be parsed, unparsed text found at index 10
  at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2049)
  at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1874)
  at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:78)
  ... 140 more

由于模式

yyyy-MM-dd

未涵盖输入日期的剩余部分，因此存在“索引10处未分析的文本”

有关有效的日期和时间格式模式，请参阅。最简单的似乎是使用

date\u格式

标准函数

val q = Seq("2006-04-01 01:00:00.000 +0200")
  .toDF("d")
  .select(date_format($"d","yyyy-MM-dd")) // date_format
scala> q.show
+--------------------------+
|date_format(d, yyyy-MM-dd)|
+--------------------------+
|                2006-04-01|
+--------------------------+

在使用日期格式后…我得到了实际日期2006-04-01 00:00:00.000+0200 | 2006-03-31 |