String PySpark:将字符串转换为时间戳会给出错误的时间
我使用以下代码将字符串类型timeString PySpark:将字符串转换为时间戳会给出错误的时间,string,pyspark,timestamp,unix-timestamp,String,Pyspark,Timestamp,Unix Timestamp,我使用以下代码将字符串类型timetimstm\u hm转换为timestamp timetimstm\u hm\u timestamp。这是代码 from pyspark.sql.functions import col, unix_timestamp df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp")) 以下是结果 -
timstm\u hm
转换为timestamp timetimstm\u hm\u timestamp
。这是代码
from pyspark.sql.functions import col, unix_timestamp
df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp"))
以下是结果
-------------------------------------------------
| timstm_hm | timstm_hm_timestamp |
-------------------------------------------------
|2018-02-08 11:04 | 2018-01-08 11:04:00 |
-------------------------------------------------
|2018-02-27 20:34 | 2018-01-27 20:34:00 |
-------------------------------------------------
|2018-02-23 19:47 | 2018-01-23 19:47:00 |
-------------------------------------------------
为什么转换之间有一个月的差异?这很奇怪,因为它适用于一月,但从二月起就不起作用了。你只需要用大写字母
mm
替换mm
from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()
+----------------+--------------------+
| timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+
有关更多信息,请参阅java日期格式:
此外,您还可以通过使用大写字母对时间戳使用来实现相同的输出
from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()
+----------------+--------------------+
| timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+
您只需用大写字母mm
替换mm
from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()
+----------------+--------------------+
| timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+
有关更多信息,请参阅java日期格式:
此外,您还可以通过使用大写字母对时间戳使用来实现相同的输出
from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()
+----------------+--------------------+
| timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+