Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 火花-秒至HH/mm aa_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark 火花-秒至HH/mm aa

Apache spark 火花-秒至HH/mm aa,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有一个脚本,目前正在窗口30分钟的时段,并计算这30分钟的平均值 为了让我的窗口按我想要的方式工作,我需要将一个基本时间戳MM/dd/yyyyy HH:MM:ss aa转换为unix_时间戳,时间仅为小时和分钟 当前代码: val taxiSub = spark.read.format("csv").option("header", true).option("inferSchema", true).load("/user/zeppelin/taxi/taxi_subset.csv") tax

我有一个脚本,目前正在窗口30分钟的时段,并计算这30分钟的平均值

为了让我的窗口按我想要的方式工作,我需要将一个基本时间戳
MM/dd/yyyyy HH:MM:ss aa
转换为unix_时间戳,时间仅为小时和分钟

当前代码:

val taxiSub = spark.read.format("csv").option("header", true).option("inferSchema", true).load("/user/zeppelin/taxi/taxi_subset.csv")
taxiSub.createOrReplaceTempView("taxiSub")
val time=taxiSub.withColumn("Pickup",from_unixtime(unix_timestamp(col(("tpep_pickup_datetime")),"MM/dd/yyyy hh:mm:ss aa"),"MM/dd/yyyy HH:mm")).withColumn("Dropoff",from_unixtime(unix_timestamp(col(("tpep_dropoff_datetime")),"MM/dd/yyyy hh:mm:ss aa"),"MM/dd/yyyy HH:mm"))
val stamp = time.withColumn("tmp",to_timestamp(col("Pickup"),"MM/dd/yyyy HH:mm"))
.withColumn("StartTimestamp", unix_timestamp(concat_ws(":",hour(col("tmp")),minute(col("tmp"))),"HH:mm")).drop("tmp")

val windowSpec = Window.orderBy("StartTimestamp").rangeBetween(-1800,Window.currentRow)
val byRange = stamp.withColumn("avgPassengers",avg(col("passenger_count")).over(windowSpec)).orderBy(desc("StartTimestamp")).withColumn("EndTime",col("StartTimestamp")+1800)
val answer = byRange.withColumn("Start",)

byRange.createOrReplaceTempView("byRangeTable")
spark.sqlContext.sql("SELECT StartTimestamp,EndTime,avg(avgPassengers) as AvgPassengers  FROM byRangeTable group by StartTimestamp,EndTime ORDER BY AvgPassengers DESC ").show(truncate=false)
电流输出:

+--------------+-------+------------------+
|StartTimestamp|EndTime|AvgPassengers     |
+--------------+-------+------------------+
|28140         |29940  |2.0851063829787235|
|28200         |30000  |2.0833333333333335|
|26940         |28740  |2.0714285714285716|
如何将“StartTimestamp”和“EndTime”转换回
HH/mm aa

即,我正试图将上述内容转换为:

+--------------+------------+------------------+
|StartTimestamp|EndTime     |AvgPassengers     |
+--------------+------------+------------------+
|07:49:00 am   |08:19:00 am |2.0851063829787235|
|07:50:00 am   |08:20:00 am |2.0833333333333335|
|07:29:00 am   |07:59:00 am |2.0714285714285716|

使用_unixtime()函数中的
,输出格式为
'hh:mm:ss a'

示例:

spark.sql("select from_unixtime('28140','hh:mm:ss a')").show()
//+-----------+
//|        _c0|
//+-----------+
//|01:49:00 AM|
//+-----------+
//in dataframe api
df.withColumn("StartTimestamp",from_unixtime(col("StartTimestamp"),"hh:mm:ss a")).
withColumn("EndTime",from_unixtime(col("EndTime"),"hh:mm:ss a")).show()

//in sql
sqlContext.sql("select from_unixtime(StartTimestamp,'hh:mm:ss a') as StartTimestamp,from_unixtime(EndTime,'hh:mm:ss a') as EndTime,AvgPassengers from tmp").show()

//timestamp values differ from question based on session timezone
//+--------------+-----------+------------------+
//|StartTimestamp|    EndTime|     AvgPassengers|
//+--------------+-----------+------------------+
//|   01:49:00 AM|02:19:00 AM|2.0851063829787235|
//|   01:50:00 AM|02:20:00 AM|2.0833333333333335|
//+--------------+-----------+------------------+

针对您的案例:

spark.sql("select from_unixtime('28140','hh:mm:ss a')").show()
//+-----------+
//|        _c0|
//+-----------+
//|01:49:00 AM|
//+-----------+
//in dataframe api
df.withColumn("StartTimestamp",from_unixtime(col("StartTimestamp"),"hh:mm:ss a")).
withColumn("EndTime",from_unixtime(col("EndTime"),"hh:mm:ss a")).show()

//in sql
sqlContext.sql("select from_unixtime(StartTimestamp,'hh:mm:ss a') as StartTimestamp,from_unixtime(EndTime,'hh:mm:ss a') as EndTime,AvgPassengers from tmp").show()

//timestamp values differ from question based on session timezone
//+--------------+-----------+------------------+
//|StartTimestamp|    EndTime|     AvgPassengers|
//+--------------+-----------+------------------+
//|   01:49:00 AM|02:19:00 AM|2.0851063829787235|
//|   01:50:00 AM|02:20:00 AM|2.0833333333333335|
//+--------------+-----------+------------------+