Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 窗口上的Spark条件滞后函数_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

Scala 窗口上的Spark条件滞后函数

Scala 窗口上的Spark条件滞后函数,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有一个数据框,其中一个值标签与(id、bin、日期、小时)关联: 我想在前一天的同一个小时、前一天的一个小时等将多个列附加到与标签相对应的数据帧。我知道如何使用lag函数获取第一个列: val dateWindow = Window.partitionBy($"id", $"bin").orderBy($"hour", $"date") val expandedDf = data.withColumn("yesterdaySameHour", lag($"label", 1, 0.0).ove

我有一个数据框,其中一个值
标签
(id、bin、日期、小时)
关联:

我想在前一天的同一个小时、前一天的一个小时等将多个列附加到与
标签相对应的数据帧。我知道如何使用lag函数获取第一个列:

val dateWindow = Window.partitionBy($"id", $"bin").orderBy($"hour", $"date")
val expandedDf = data.withColumn("yesterdaySameHour", lag($"label", 1, 0.0).over(dateWindow))
但是,我不知道如何在前一天的
hour-1
获取值
label
。有没有一种方法可以产生一个条件延迟,我可以过滤掉大于或等于当前行小时的
hour
?如果没有,正确的方法是什么


非常感谢。

您必须根据自己的用途指定
窗口的功能。您可能需要使用
lag
功能两次

import org.apache.spark.sql.expressions.Window

val dW = Window.partitionBy("id", "bin", "hour").orderBy("date")
val hW = Window.partitionBy("id", "bin", "date").orderBy("hour")

df.withColumn("yesterdaySameHour", lag("label", 1, 0.0).over(dW))
  .withColumn("todayPreviousHour", lag("label", 1, 0.0).over(hW))
  .withColumn("yestedayPreviousHour", lag(lag("label", 1, 0.0).over(dW), 1, 0.0).over(hW))
  .orderBy("date", "hour", "bin")
  .show(false)
这将为您提供以下结果:

+----------+----+---+---+-----+-----------------+-----------------+--------------------+
|date      |hour|id |bin|label|yesterdaySameHour|todayPreviousHour|yestedayPreviousHour|
+----------+----+---+---+-----+-----------------+-----------------+--------------------+
|2019_12_19|7   |1  |0  |-1   |0                |0                |0                   |
|2019_12_19|7   |1  |2  |-2   |0                |0                |0                   |
|2019_12_19|7   |1  |3  |-3   |0                |0                |0                   |
|2019_12_19|8   |1  |0  |1    |0                |-1               |0                   |
|2019_12_19|8   |1  |2  |2    |0                |-2               |0                   |
|2019_12_19|8   |1  |3  |3    |0                |-3               |0                   |
|2019_12_20|7   |1  |0  |4    |-1               |0                |0                   |
|2019_12_20|7   |1  |2  |5    |-2               |0                |0                   |
|2019_12_20|7   |1  |3  |6    |-3               |0                |0                   |
|2019_12_20|8   |1  |0  |7    |1                |4                |-1                  |
|2019_12_20|8   |1  |2  |8    |2                |5                |-2                  |
|2019_12_20|8   |1  |3  |9    |3                |6                |-3                  |
+----------+----+---+---+-----+-----------------+-----------------+--------------------+

将日期移动到partitionby。谢谢,我理解这一点-我的问题是我是否可以用Windows获得“昨天前一个小时”。目前,我做了一个自我加入,并根据日期/小时差异进行筛选,但这感觉太过分了。你可以结合lag twicr得到前一天的小时数-1。我将在一天内更新答案。@ZeynepAkkalyoncuYilmaz,更新了我的答案。我希望这能解决你的问题。
+----------+----+---+---+-----+-----------------+-----------------+--------------------+
|date      |hour|id |bin|label|yesterdaySameHour|todayPreviousHour|yestedayPreviousHour|
+----------+----+---+---+-----+-----------------+-----------------+--------------------+
|2019_12_19|7   |1  |0  |-1   |0                |0                |0                   |
|2019_12_19|7   |1  |2  |-2   |0                |0                |0                   |
|2019_12_19|7   |1  |3  |-3   |0                |0                |0                   |
|2019_12_19|8   |1  |0  |1    |0                |-1               |0                   |
|2019_12_19|8   |1  |2  |2    |0                |-2               |0                   |
|2019_12_19|8   |1  |3  |3    |0                |-3               |0                   |
|2019_12_20|7   |1  |0  |4    |-1               |0                |0                   |
|2019_12_20|7   |1  |2  |5    |-2               |0                |0                   |
|2019_12_20|7   |1  |3  |6    |-3               |0                |0                   |
|2019_12_20|8   |1  |0  |7    |1                |4                |-1                  |
|2019_12_20|8   |1  |2  |8    |2                |5                |-2                  |
|2019_12_20|8   |1  |3  |9    |3                |6                |-3                  |
+----------+----+---+---+-----+-----------------+-----------------+--------------------+