pyspark:两个日期列之间的小时差
我想计算pyspark中两个日期列之间的小时数。 只能找到如何计算日期之间的天数pyspark:两个日期列之间的小时差,pyspark,datediff,Pyspark,Datediff,我想计算pyspark中两个日期列之间的小时数。 只能找到如何计算日期之间的天数 dfs_4.show() +--------------------+--------------------+ | request_time| max_time| +--------------------+--------------------+ |2017-11-17 00:18:...|2017-11-20 23:59:...| |2017-11-17 00:07:
dfs_4.show()
+--------------------+--------------------+
| request_time| max_time|
+--------------------+--------------------+
|2017-11-17 00:18:...|2017-11-20 23:59:...|
|2017-11-17 00:07:...|2017-11-20 23:59:...|
|2017-11-17 00:35:...|2017-11-20 23:59:...|
|2017-11-17 00:10:...|2017-11-20 23:59:...|
|2017-11-17 00:03:...|2017-11-20 23:59:...|
|2017-11-17 00:45:...|2017-11-20 23:59:...|
|2017-11-17 00:35:...|2017-11-20 23:59:...|
|2017-11-17 00:59:...|2017-11-20 23:59:...|
|2017-11-17 00:28:...|2017-11-20 23:59:...|
|2017-11-17 00:11:...|2017-11-20 23:59:...|
|2017-11-17 00:13:...|2017-11-20 23:59:...|
|2017-11-17 00:42:...|2017-11-20 23:59:...|
|2017-11-17 00:07:...|2017-11-20 23:59:...|
|2017-11-17 00:40:...|2017-11-20 23:59:...|
|2017-11-17 00:15:...|2017-11-20 23:59:...|
|2017-11-17 00:05:...|2017-11-20 23:59:...|
|2017-11-17 00:50:...|2017-11-20 23:59:...|
|2017-11-17 00:40:...|2017-11-20 23:59:...|
|2017-11-17 00:25:...|2017-11-20 23:59:...|
|2017-11-17 00:35:...|2017-11-20 23:59:...|
+--------------------+--------------------+
计算天数:
from pyspark.sql import functions as F
dfs_5 = dfs_4.withColumn('date_diff', F.datediff(F.to_date(dfs_4.max_time), F.to_date(dfs_4.request_time)))
dfs_5.show()
+--------------------+--------------------+---------+
| request_time| max_time|date_diff|
+--------------------+--------------------+---------+
|2017-11-17 00:18:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:07:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:35:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:10:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:03:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:45:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:35:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:59:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:28:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:11:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:13:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:42:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:07:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:40:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:15:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:05:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:50:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:40:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:25:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:35:...|2017-11-20 23:59:...| 3|
+--------------------+--------------------+---------+
我怎么能在几个小时内做同样的事情?
感谢您的帮助您可以使用它从日期时间字段中提取小时,然后将其减去到新列中。现在有一种情况,时差超过一天,你需要在这两者之间加上整整几天。因此,我会像您一样创建days_diff专栏,然后尝试以下方法:
from pyspark.sql import functions as F
dfs_5 = dfs_4.withColumn('hours_diff', (dfs_4.date_diff*24) +
F.hour(dfs_4.max_time) - F.hour(dfs_4.request_time))
很高兴我能帮你投票,如果这是一个有用的答案