Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark:错误的时间戳解析_Scala_Date_Apache Spark_Timestamp - Fatal编程技术网

Scala Spark:错误的时间戳解析

Scala Spark:错误的时间戳解析,scala,date,apache-spark,timestamp,Scala,Date,Apache Spark,Timestamp,我正在创建以下数据帧 syncs.select($"event.timestamp",to_date($"event.timestamp".cast(TimestampType))).show 这包括以下行 timestamp|to_date(CAST(`event.timestamp` AS TIMESTAMP))| -------------+---------------------------------------------+ 158

我正在创建以下数据帧

syncs.select($"event.timestamp",to_date($"event.timestamp".cast(TimestampType))).show
这包括以下行

    timestamp|to_date(CAST(`event.timestamp` AS TIMESTAMP))|
-------------+---------------------------------------------+
1589509800768|                                  52339-07-25|
1589509802730|                                  52339-07-25|
1589509809092|                                  52339-07-25|
1589509810402|                                  52339-07-25|
1589509812112|                                  52339-07-25|
1589509817489|                                  52339-07-25|
1589509818065|                                  52339-07-25|
1589509818902|                                  52339-07-25|
1589509819020|                                  52339-07-25|
1589509819425|                                  52339-07-25|
1589509819830|                                  52339-07-25|
根据
158950900768
是2020年5月15日星期五02:30:00


我不明白为什么我会得到这些未来的日期。从时间戳到日期的转换是否也需要某种日期格式?

Spark需要以秒为单位的历元时间,而不是以毫秒为单位的历元时间,因此可以将其除以1000

scala>val values=List(158950900768L)
值:列表[长]=列表(158950900768)
scala>val df=values.toDF()
df:org.apache.spark.sql.DataFrame=[value:bigint]
scala>df.show(假)
+-------------+
|价值观|
+-------------+
|1589509800768|
+-------------+
scala>df.select((col(“value”)/1000).cast(TimestampType).as(“current_time”)).show(false)
+-----------------------+
|当前时间|
+-----------------------+
|2020-05-14 19:30:00.768|
+-----------------------+
scala>df.select((col(“value”)/1000).cast(TimestampType).as(“current\u time”)).withColumn(“time\u utc”,
|expr(““到utc时间戳(当前时间,PST”)”)
|)。显示(错误)
+-----------------------+-----------------------+
|当前时间utc时间|
+-----------------------+-----------------------+
|2020-05-14 19:30:00.768|2020-05-15 02:30:00.768|
+-----------------------+-----------------------+

Spark需要以秒为单位的历元时间,而不是以毫秒为单位的历元时间,因此您可以将其除以1000

scala>val values=List(158950900768L)
值:列表[长]=列表(158950900768)
scala>val df=values.toDF()
df:org.apache.spark.sql.DataFrame=[value:bigint]
scala>df.show(假)
+-------------+
|价值观|
+-------------+
|1589509800768|
+-------------+
scala>df.select((col(“value”)/1000).cast(TimestampType).as(“current_time”)).show(false)
+-----------------------+
|当前时间|
+-----------------------+
|2020-05-14 19:30:00.768|
+-----------------------+
scala>df.select((col(“value”)/1000).cast(TimestampType).as(“current\u time”)).withColumn(“time\u utc”,
|expr(““到utc时间戳(当前时间,PST”)”)
|)。显示(错误)
+-----------------------+-----------------------+
|当前时间utc时间|
+-----------------------+-----------------------+
|2020-05-14 19:30:00.768|2020-05-15 02:30:00.768|
+-----------------------+-----------------------+

首先,应将秒数转换为毫秒,然后再转换为时间戳或日期

objecttotimestamp exteds应用程序{
val spark=火花会话
.builder()
.appName(“ToTimestamp”)
.master(“本地[*]”)
.config(“spark.sql.shuffle.partitions”,“4”)//将数据更改为更合理的默认分区数
.config(“spark.app.id”,“ToTimestamp”)//使度量值警告静音
.getOrCreate()
val sc=spark.sparkContext
导入org.apache.spark.sql.functions_
导入spark.implicits_
val数据=sc.parallelize(列表(1589509800768L、1589509802730L、1589509092L、1589509810402L)).toDF(“毫秒”)
val toTimestamp=data.withColumn(“timestamp”,来自unixtime(col(“millis”)/1000))
toTimestamp.show(truncate=false)
/*
+-------------+-------------------+
|毫秒时间戳|
+-------------+-------------------+
|1589509800768|2020-05-15 04:30:00|
|1589509802730|2020-05-15 04:30:02|
|1589509809092|2020-05-15 04:30:09|
|1589509810402|2020-05-15 04:30:10|
+-------------+-------------------+
*/
val toDate=toTimestamp。选择EXPR(“毫秒”,“时间戳”)。带列(“数据”,至日期(列(“时间戳”))
toDate.show(truncate=false)
/*
+-------------+-------------------+----------+
|毫秒|时间戳|数据|
+-------------+-------------------+----------+
|1589509800768|2020-05-15 04:30:00|2020-05-15|
|1589509802730|2020-05-15 04:30:02|2020-05-15|
|1589509809092|2020-05-15 04:30:09|2020-05-15|
|1589509810402|2020-05-15 04:30:10|2020-05-15|
+-------------+-------------------+----------+
*/
}

首先,应将秒数转换为毫秒,然后再转换为时间戳或日期

objecttotimestamp exteds应用程序{
val spark=火花会话
.builder()
.appName(“ToTimestamp”)
.master(“本地[*]”)
.config(“spark.sql.shuffle.partitions”,“4”)//将数据更改为更合理的默认分区数
.config(“spark.app.id”,“ToTimestamp”)//使度量值警告静音
.getOrCreate()
val sc=spark.sparkContext
导入org.apache.spark.sql.functions_
导入spark.implicits_
val数据=sc.parallelize(列表(1589509800768L、1589509802730L、1589509092L、1589509810402L)).toDF(“毫秒”)
val toTimestamp=data.withColumn(“timestamp”,来自unixtime(col(“millis”)/1000))
toTimestamp.show(truncate=false)
/*
+-------------+-------------------+
|毫秒时间戳|
+-------------+-------------------+
|1589509800768|2020-05-15 04:30:00|
|1589509802730|2020-05-15 04:30:02|
|1589509809092|2020-05-15 04:30:09|
|1589509810402|2020-05-15 04:30:10|
+-------------+-------------------+
*/
val toDate=toTimestamp。选择EXPR(“毫秒”,“时间戳”)。带列(“数据”,至日期(列(“时间戳”))
toDate.show(truncate=false)
/*
+-------------+-------------------+----------+
|毫秒|时间戳|数据|
+-------------+-------------------+----------+
|1589509800768|2020-05-15 04:30:00|2020-05-15|
|1589509802730|2020-05-15 04:30:02|2020-05-15|
|1589509809092|2020-05-15 04:30:09|2020-05-15|
|1589509810402|2020-05-15 04:30:10|2020-05-15|
+-------------+-------------------+----------+
*/
}