Scala Spark:错误的时间戳解析
我正在创建以下数据帧Scala Spark:错误的时间戳解析,scala,date,apache-spark,timestamp,Scala,Date,Apache Spark,Timestamp,我正在创建以下数据帧 syncs.select($"event.timestamp",to_date($"event.timestamp".cast(TimestampType))).show 这包括以下行 timestamp|to_date(CAST(`event.timestamp` AS TIMESTAMP))| -------------+---------------------------------------------+ 158
syncs.select($"event.timestamp",to_date($"event.timestamp".cast(TimestampType))).show
这包括以下行
timestamp|to_date(CAST(`event.timestamp` AS TIMESTAMP))|
-------------+---------------------------------------------+
1589509800768| 52339-07-25|
1589509802730| 52339-07-25|
1589509809092| 52339-07-25|
1589509810402| 52339-07-25|
1589509812112| 52339-07-25|
1589509817489| 52339-07-25|
1589509818065| 52339-07-25|
1589509818902| 52339-07-25|
1589509819020| 52339-07-25|
1589509819425| 52339-07-25|
1589509819830| 52339-07-25|
根据158950900768
是2020年5月15日星期五02:30:00
我不明白为什么我会得到这些未来的日期。从时间戳到日期的转换是否也需要某种日期格式?Spark需要以秒为单位的历元时间,而不是以毫秒为单位的历元时间,因此可以将其除以1000
scala>val values=List(158950900768L)
值:列表[长]=列表(158950900768)
scala>val df=values.toDF()
df:org.apache.spark.sql.DataFrame=[value:bigint]
scala>df.show(假)
+-------------+
|价值观|
+-------------+
|1589509800768|
+-------------+
scala>df.select((col(“value”)/1000).cast(TimestampType).as(“current_time”)).show(false)
+-----------------------+
|当前时间|
+-----------------------+
|2020-05-14 19:30:00.768|
+-----------------------+
scala>df.select((col(“value”)/1000).cast(TimestampType).as(“current\u time”)).withColumn(“time\u utc”,
|expr(““到utc时间戳(当前时间,PST”)”)
|)。显示(错误)
+-----------------------+-----------------------+
|当前时间utc时间|
+-----------------------+-----------------------+
|2020-05-14 19:30:00.768|2020-05-15 02:30:00.768|
+-----------------------+-----------------------+
Spark需要以秒为单位的历元时间,而不是以毫秒为单位的历元时间,因此您可以将其除以1000
scala>val values=List(158950900768L)
值:列表[长]=列表(158950900768)
scala>val df=values.toDF()
df:org.apache.spark.sql.DataFrame=[value:bigint]
scala>df.show(假)
+-------------+
|价值观|
+-------------+
|1589509800768|
+-------------+
scala>df.select((col(“value”)/1000).cast(TimestampType).as(“current_time”)).show(false)
+-----------------------+
|当前时间|
+-----------------------+
|2020-05-14 19:30:00.768|
+-----------------------+
scala>df.select((col(“value”)/1000).cast(TimestampType).as(“current\u time”)).withColumn(“time\u utc”,
|expr(““到utc时间戳(当前时间,PST”)”)
|)。显示(错误)
+-----------------------+-----------------------+
|当前时间utc时间|
+-----------------------+-----------------------+
|2020-05-14 19:30:00.768|2020-05-15 02:30:00.768|
+-----------------------+-----------------------+
首先,应将秒数转换为毫秒,然后再转换为时间戳或日期
objecttotimestamp exteds应用程序{
val spark=火花会话
.builder()
.appName(“ToTimestamp”)
.master(“本地[*]”)
.config(“spark.sql.shuffle.partitions”,“4”)//将数据更改为更合理的默认分区数
.config(“spark.app.id”,“ToTimestamp”)//使度量值警告静音
.getOrCreate()
val sc=spark.sparkContext
导入org.apache.spark.sql.functions_
导入spark.implicits_
val数据=sc.parallelize(列表(1589509800768L、1589509802730L、1589509092L、1589509810402L)).toDF(“毫秒”)
val toTimestamp=data.withColumn(“timestamp”,来自unixtime(col(“millis”)/1000))
toTimestamp.show(truncate=false)
/*
+-------------+-------------------+
|毫秒时间戳|
+-------------+-------------------+
|1589509800768|2020-05-15 04:30:00|
|1589509802730|2020-05-15 04:30:02|
|1589509809092|2020-05-15 04:30:09|
|1589509810402|2020-05-15 04:30:10|
+-------------+-------------------+
*/
val toDate=toTimestamp。选择EXPR(“毫秒”,“时间戳”)。带列(“数据”,至日期(列(“时间戳”))
toDate.show(truncate=false)
/*
+-------------+-------------------+----------+
|毫秒|时间戳|数据|
+-------------+-------------------+----------+
|1589509800768|2020-05-15 04:30:00|2020-05-15|
|1589509802730|2020-05-15 04:30:02|2020-05-15|
|1589509809092|2020-05-15 04:30:09|2020-05-15|
|1589509810402|2020-05-15 04:30:10|2020-05-15|
+-------------+-------------------+----------+
*/
}
首先,应将秒数转换为毫秒,然后再转换为时间戳或日期
objecttotimestamp exteds应用程序{
val spark=火花会话
.builder()
.appName(“ToTimestamp”)
.master(“本地[*]”)
.config(“spark.sql.shuffle.partitions”,“4”)//将数据更改为更合理的默认分区数
.config(“spark.app.id”,“ToTimestamp”)//使度量值警告静音
.getOrCreate()
val sc=spark.sparkContext
导入org.apache.spark.sql.functions_
导入spark.implicits_
val数据=sc.parallelize(列表(1589509800768L、1589509802730L、1589509092L、1589509810402L)).toDF(“毫秒”)
val toTimestamp=data.withColumn(“timestamp”,来自unixtime(col(“millis”)/1000))
toTimestamp.show(truncate=false)
/*
+-------------+-------------------+
|毫秒时间戳|
+-------------+-------------------+
|1589509800768|2020-05-15 04:30:00|
|1589509802730|2020-05-15 04:30:02|
|1589509809092|2020-05-15 04:30:09|
|1589509810402|2020-05-15 04:30:10|
+-------------+-------------------+
*/
val toDate=toTimestamp。选择EXPR(“毫秒”,“时间戳”)。带列(“数据”,至日期(列(“时间戳”))
toDate.show(truncate=false)
/*
+-------------+-------------------+----------+
|毫秒|时间戳|数据|
+-------------+-------------------+----------+
|1589509800768|2020-05-15 04:30:00|2020-05-15|
|1589509802730|2020-05-15 04:30:02|2020-05-15|
|1589509809092|2020-05-15 04:30:09|2020-05-15|
|1589509810402|2020-05-15 04:30:10|2020-05-15|
+-------------+-------------------+----------+
*/
}