String PySpark:将字符串转换为时间戳会给出错误的时间

String PySpark:将字符串转换为时间戳会给出错误的时间,string,pyspark,timestamp,unix-timestamp,String,Pyspark,Timestamp,Unix Timestamp,我使用以下代码将字符串类型timetimstm\u hm转换为timestamp timetimstm\u hm\u timestamp。这是代码 from pyspark.sql.functions import col, unix_timestamp df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp")) 以下是结果 -

我使用以下代码将字符串类型time
timstm\u hm
转换为timestamp time
timstm\u hm\u timestamp
。这是代码

from pyspark.sql.functions import col, unix_timestamp
df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp"))
以下是结果

-------------------------------------------------
|   timstm_hm         |   timstm_hm_timestamp   |  
-------------------------------------------------
|2018-02-08 11:04     | 2018-01-08 11:04:00     | 
-------------------------------------------------
|2018-02-27 20:34     | 2018-01-27 20:34:00     | 
-------------------------------------------------
|2018-02-23 19:47     | 2018-01-23 19:47:00     | 
-------------------------------------------------

为什么转换之间有一个月的差异?这很奇怪,因为它适用于一月,但从二月起就不起作用了。

你只需要用大写字母
mm
替换
mm

from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()

+----------------+--------------------+
|       timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+
有关更多信息,请参阅java日期格式:

此外,您还可以通过使用大写字母
对时间戳
使用来实现相同的输出

from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()

+----------------+--------------------+
|       timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+

您只需用大写字母
mm
替换
mm

from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()

+----------------+--------------------+
|       timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+
有关更多信息,请参阅java日期格式:

此外,您还可以通过使用大写字母
对时间戳
使用来实现相同的输出

from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()

+----------------+--------------------+
|       timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+