Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何解析日期?_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark 如何解析日期?

Apache spark 如何解析日期?,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我的日期格式是: 2006-04-01 01:00:00.000 +0200 但是我需要:2006-04-01 无法从UNIX时间戳格式识别 valid_wdf .withColumn("MYDateOnly", to_date(from_unixtime(unix_timestamp("Formatted Date","yyyy-MM-dd")))) .show() 而且它说的是这样的: org.apache.spark.

我的日期格式是:

2006-04-01 01:00:00.000 +0200
但是我需要:
2006-04-01

无法从UNIX时间戳格式识别

valid_wdf
  .withColumn("MYDateOnly", to_date(from_unixtime(unix_timestamp("Formatted Date","yyyy-MM-dd"))))
  .show()
而且它说的是这样的:

org.apache.spark.SparkUpgradeException:您可能会得到不同的结果 由于Spark 3.0的升级:无法解析“2006-04-01” 新解析器中的00:00:00.000+0200'。你可以设置 spark.sql.legacy.timeParserPolicy到legacy以恢复行为 在Spark 3.0之前,或设置为已纠正,并将其视为无效 日期时间字符串


我想知道为什么要使用这个库。如果有任何解释,我们将不胜感激。

让我们使用Spark 3.0.1中的以下查询并查看异常

Seq("2006-04-01 01:00:00.000 +0200")
  .toDF("d")
  .select(unix_timestamp($"d","yyyy-MM-dd"))
  .show
异常确实说明了SparkUpgradeException的原因,但您必须查看堆栈跟踪的底部,其中显示:

org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2006-04-01 01:00:00.000 +0200' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
  at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:150)
...
Caused by: java.time.format.DateTimeParseException: Text '2006-04-01 01:00:00.000 +0200' could not be parsed, unparsed text found at index 10
  at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2049)
  at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1874)
  at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:78)
  ... 140 more
由于模式
yyyy-MM-dd
未涵盖输入日期的剩余部分,因此存在“索引10处未分析的文本”

有关有效的日期和时间格式模式,请参阅。最简单的似乎是使用
date\u格式
标准函数

val q = Seq("2006-04-01 01:00:00.000 +0200")
  .toDF("d")
  .select(date_format($"d","yyyy-MM-dd")) // date_format
scala> q.show
+--------------------------+
|date_format(d, yyyy-MM-dd)|
+--------------------------+
|                2006-04-01|
+--------------------------+

在使用日期格式后…我得到了实际日期2006-04-01 00:00:00.000+0200 | 2006-03-31 |