Apache spark 在Apache Spark中，unix_timestamp（）能否以毫秒为单位返回unix时间？_Apache Spark_Apache Spark Sql_Unix Timestamp

Apache spark 在Apache Spark中，unix_timestamp（）能否以毫秒为单位返回unix时间？

apache-spark

Apache spark 在Apache Spark中，unix_timestamp（）能否以毫秒为单位返回unix时间？,apache-spark,apache-spark-sql,unix-timestamp,Apache Spark,Apache Spark Sql,Unix Timestamp,我试图以毫秒（13位）为单位从时间戳字段获取unix时间，但目前它以秒（10位）为单位返回尽管2017-01-18 11:00:00.123和2017-01-18 11:00:00.000是不同的，但我得到了相同的unix时间1484758800 我缺少什么？unix\u timestamp（）以秒为单位返回unix时间戳时间戳中的最后3位数字与毫秒字符串的最后3位数字相同（1.999sec=1999毫秒），因此只需获取时间戳字符串的最后3位数字并附加到毫秒字符串的末尾。实现中建议的方法输

我试图以毫秒（13位）为单位从时间戳字段获取unix时间，但目前它以秒（10位）为单位返回

尽管

2017-01-18 11:00:00.123

和

2017-01-18 11:00:00.000

是不同的，但我得到了相同的unix时间

1484758800

我缺少什么？

unix\u timestamp（）

以秒为单位返回unix时间戳

时间戳中的最后3位数字与毫秒字符串的最后3位数字相同（

1.999sec=1999毫秒

），因此只需获取时间戳字符串的最后3位数字并附加到毫秒字符串的末尾。

实现中建议的方法

输出：

+----------------------------+
|TIME                        |
+----------------------------+
|22-Jul-2018 04:21:18.792 UTC|
|23-Jul-2018 04:21:25.888 UTC|
+----------------------------+
root
|-- TIME: string (nullable = true)

将字符串时间格式（包括毫秒）转换为unix\u时间戳（双精度）。使用子字符串方法（start\u position=-7，length\u of\u substring=3）从字符串中提取毫秒，并将毫秒分别添加到unix\u时间戳中。（强制转换到子字符串以浮动以进行添加）

在Spark中将unix时间戳（双精度）转换为时间戳数据类型

df2 = df1.withColumn("TimestampType",F.to_timestamp(df1["unix_timestamp"]))
df2.show(n=2,truncate=False)

这将为您提供以下输出

+----------------------------+----------------+-----------------------+
|TIME                        |unix_timestamp  |TimestampType          |
+----------------------------+----------------+-----------------------+
|22-Jul-2018 04:21:18.792 UTC|1.532233278792E9|2018-07-22 04:21:18.792|
|23-Jul-2018 04:21:25.888 UTC|1.532319685888E9|2018-07-23 04:21:25.888|
+----------------------------+----------------+-----------------------+

检查架构：

df2.printSchema()


root
 |-- TIME: string (nullable = true)
 |-- unix_timestamp: double (nullable = true)
 |-- TimestampType: timestamp (nullable = true)

毫秒隐藏在分数部分时间戳格式中

试试这个：

df = df.withColumn("time_in_milliseconds", col("time").cast("double"))

你会得到1484758800.792，其中792是毫秒

至少它适合我（Scala、Spark、Hive）

直到Spark 3.0.1版，使用SQL内置函数
unix\u timestamp
将时间戳转换为unix时间（毫秒）是不可能的
根据Spark的密码
时间戳以
java.sql.Timestamp
的形式对外公开，在内部以
longs
的形式存储，能够以微秒的精度存储时间戳
因此，如果您定义一个UDF，该UDF具有
java.sql.Timestamp
作为输入，那么您可以调用
getTime
以毫秒为单位的长时间。如果应用
unix\u时间戳
，则只能获得精度为秒的unix时间

val tsConversionToLongUdf=udf（（ts:java.sql.Timestamp）=>ts.getTime）
将此应用于各种时间戳：

val df=Seq（“2017-01-18 11:00:00.000”、“2017-01-18 11:00:00.111”、“2017-01-18 11:00:00.110”、“2017-01-18 11:00.100”） .toDF（“时间戳字符串”） .withColumn（“timestamp”），to_timestamp（col（“timestampString”）） .withColumn（“timestampConversionToLong”，tsConversionToLongUdf（col（“timestamp”）） .withColumn（“timestampUnixTimestamp”），unix_timestamp（col（“timestamp”）） df.printSchema（） df.show（假） //返回根 |--timestampString:string（nullable=true） |--时间戳：时间戳（nullable=true） |--timestampConversionToLong:long（nullable=false） |--timestamscastaslong:long（nullable=true） +-----------------------+-----------------------+-------------------------+-------------------+ |timestampString | timestamp | timestampConversionToLong | timestampUnixTimestamp| +-----------------------+-----------------------+-------------------------+-------------------+ |2017-01-18 11:00:00.000|2017-01-18 11:00:00 |1484733600000 |1484733600 | |2017-01-18 11:00:00.111|2017-01-18 11:00:00.111|1484733600111 |1484733600 | |2017-01-18 11:00:00.110|2017-01-18 11:00:00.11 |1484733600110 |1484733600 | |2017-01-18 11:00:00.100|2017-01-18 11:00:00.1 |1484733600100 |1484733600 | +-----------------------+-----------------------+-------------------------+-------------------+
无法使用unix\u timestamp（）完成此操作，但自Spark 3.1.0以来，有一个名为unix\u millis（）的内置函数：
unix_millis（时间戳）-返回UTC 1970-01-01 00:00:00以来的毫秒数。截断更高级别的精度

+----------------------------+----------------+-----------------------+ |TIME |unix_timestamp |TimestampType | +----------------------------+----------------+-----------------------+ |22-Jul-2018 04:21:18.792 UTC|1.532233278792E9|2018-07-22 04:21:18.792| |23-Jul-2018 04:21:25.888 UTC|1.532319685888E9|2018-07-23 04:21:25.888| +----------------------------+----------------+-----------------------+

df2.printSchema() root |-- TIME: string (nullable = true) |-- unix_timestamp: double (nullable = true) |-- TimestampType: timestamp (nullable = true)

df = df.withColumn("time_in_milliseconds", col("time").cast("double"))