Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/actionscript-3/7.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
.rowsBetween(Window.unbounddpreceiding,Window.unboundedFollowing)错误Spark Scala_Scala_Apache Spark_Window_Partition By - Fatal编程技术网

.rowsBetween(Window.unbounddpreceiding,Window.unboundedFollowing)错误Spark Scala

.rowsBetween(Window.unbounddpreceiding,Window.unboundedFollowing)错误Spark Scala,scala,apache-spark,window,partition-by,Scala,Apache Spark,Window,Partition By,您好,我正在尝试将每个窗口的最后一个值扩展到列count的其余窗口,以便创建一个标志,识别寄存器是否是窗口的最后一个值。我这样试过,但没有成功 样本DF: val df_197 = Seq [(Int, Int, Int, Int)]((1,1,7,10),(1,10,4,300),(1,3,14,50),(1,20,24,70),(1,30,12,90),(2,10,4,900),(2,25,30,40),(2,15,21,60),(2,5,10,80)).toDF("policyId","F

您好,我正在尝试将每个窗口的最后一个值扩展到列
count
的其余窗口,以便创建一个标志,识别寄存器是否是窗口的最后一个值。我这样试过,但没有成功

样本DF:

val df_197 = Seq [(Int, Int, Int, Int)]((1,1,7,10),(1,10,4,300),(1,3,14,50),(1,20,24,70),(1,30,12,90),(2,10,4,900),(2,25,30,40),(2,15,21,60),(2,5,10,80)).toDF("policyId","FECMVTO","aux","IND_DEF").orderBy(asc("policyId"), asc("FECMVTO"))
df_197.show
结果(第一个分区的所有元素的列计数I需要为5,第二个分区的所有元素的列计数I需要为4):

然后我读到当您在
windowPartition
子句之后使用
orderBy
时,您必须指定子句
.rowsbeween(Window.unboundedreceiding,Window.unboundedFollowing)
来实现我所需要的。但是,当我尝试时,我面临着以下错误:

val juntar_riesgo = 1
val var_entidad_2 = $"aux"

//Particionar por uno o dos campos en funcion del valor de la variable juntar_riesgo
//Se creará window_number_2 basado en este particionamiento
val winSpec = if(juntar_riesgo == 1) {
  Window.partitionBy($"policyId").orderBy($"FECMVTO")  
        .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
} else {
  Window.partitionBy(var_entidad_2,$"policyId").orderBy("FECMVTO")
        .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
}

val df_198 = df_197.withColumn("window_number", row_number().over(winSpec))
                   .withColumn("count", last("window_number",true) over (winSpec))
                   .withColumn("FLG_LAST_WDW", when(col("window_number") === col("count"),1).otherwise(lit(0))).show

谢谢你的帮助

此处不应使用
last
,而应使用
max
而不指定排序:

val df_198 = df_197
  .withColumn("window_number", row_number().over(Window.partitionBy($"policyId").orderBy($"FECMVTO")))
  .withColumn("count", max("window_number") over (Window.partitionBy($"policyId")))
  .withColumn("FLG_LAST_WDW", when(col("window_number") === col("count"),1).otherwise(lit(0))).show


+--------+-------+---+-------+-------------+-----+------------+
|policyId|FECMVTO|aux|IND_DEF|window_number|count|FLG_LAST_WDW|
+--------+-------+---+-------+-------------+-----+------------+
|       1|      1|  7|     10|            1|    5|           0|
|       1|      3| 14|     50|            2|    5|           0|
|       1|     10|  4|    300|            3|    5|           0|
|       1|     20| 24|     70|            4|    5|           0|
|       1|     30| 12|     90|            5|    5|           1|
|       2|      5| 10|     80|            1|    4|           0|
|       2|     10|  4|    900|            2|    4|           0|
|       2|     15| 21|     60|            3|    4|           0|
|       2|     25| 30|     40|            4|    4|           1|
+--------+-------+---+-------+-------------+-----+------------+
请注意,您可以通过按降序计算
行数
,然后取
行数===1,来缩短代码:

val df_198 = df_197
  .withColumn("FLG_LAT_WDW", when(row_number().over(Window.partitionBy($"policyId").orderBy($"FECMVTO".desc))===1,1).otherwise(0))
  .show
val juntar_riesgo = 1
val var_entidad_2 = $"aux"

//Particionar por uno o dos campos en funcion del valor de la variable juntar_riesgo
//Se creará window_number_2 basado en este particionamiento
val winSpec = if(juntar_riesgo == 1) {
  Window.partitionBy($"policyId").orderBy($"FECMVTO")  
        .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
} else {
  Window.partitionBy(var_entidad_2,$"policyId").orderBy("FECMVTO")
        .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
}

val df_198 = df_197.withColumn("window_number", row_number().over(winSpec))
                   .withColumn("count", last("window_number",true) over (winSpec))
                   .withColumn("FLG_LAST_WDW", when(col("window_number") === col("count"),1).otherwise(lit(0))).show
ERROR: org.apache.spark.sql.AnalysisException: Window Frame specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()) must match the required frame specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$());
val df_198 = df_197
  .withColumn("window_number", row_number().over(Window.partitionBy($"policyId").orderBy($"FECMVTO")))
  .withColumn("count", max("window_number") over (Window.partitionBy($"policyId")))
  .withColumn("FLG_LAST_WDW", when(col("window_number") === col("count"),1).otherwise(lit(0))).show


+--------+-------+---+-------+-------------+-----+------------+
|policyId|FECMVTO|aux|IND_DEF|window_number|count|FLG_LAST_WDW|
+--------+-------+---+-------+-------------+-----+------------+
|       1|      1|  7|     10|            1|    5|           0|
|       1|      3| 14|     50|            2|    5|           0|
|       1|     10|  4|    300|            3|    5|           0|
|       1|     20| 24|     70|            4|    5|           0|
|       1|     30| 12|     90|            5|    5|           1|
|       2|      5| 10|     80|            1|    4|           0|
|       2|     10|  4|    900|            2|    4|           0|
|       2|     15| 21|     60|            3|    4|           0|
|       2|     25| 30|     40|            4|    4|           1|
+--------+-------+---+-------+-------------+-----+------------+
val df_198 = df_197
  .withColumn("FLG_LAT_WDW", when(row_number().over(Window.partitionBy($"policyId").orderBy($"FECMVTO".desc))===1,1).otherwise(0))
  .show