Scala 在Spark窗口函数中，为什么需要在末尾使用drop（）_Scala_Apache Spark

Scala 在Spark窗口函数中，为什么需要在末尾使用drop（）

scala apache-spark

Scala 在Spark窗口函数中，为什么需要在末尾使用drop（）,scala,apache-spark,Scala,Apache Spark,我不熟悉Spark窗口函数。我正在实现几个示例来了解更多信息。看看下面的例子。它将drop（）与withColumn（）一起使用。我也在spark docs上搜索了很多，但无法理解它的意义 //Get the top record in each subject with the highest fee val wSpec = Window.partitionBy($"Subject").orderBy($"Fee".desc) val dfTop = input.withColumn("rn"

我不熟悉Spark窗口函数。我正在实现几个示例来了解更多信息。看看下面的例子。它将drop（）与withColumn（）一起使用。我也在spark docs上搜索了很多，但无法理解它的意义

//Get the top record in each subject with the highest fee
val wSpec = Window.partitionBy($"Subject").orderBy($"Fee".desc)
val dfTop = input.withColumn("rn", row_number.over(wSpec)).where($"rn"===1).drop("rn") //Note: 'input' has my data 
dfTop.show()

有人能解释一下drop（）的意义吗？如果我不使用drop（），该怎么办

谢谢

为什么我们需要在结尾使用drop（）

我们没有。我们这样做是为了移除不再携带有用信息的临时对象

如果我不使用drop（），该怎么办

您将有一个或多个列，其中包含多个列，没有多个，没有少个。

drop（）用于删除您不想再进一步的列，没有多大意义

//Get the top record in each subject with the highest fee
val wSpec = Window.partitionBy($"Subject").orderBy($"Fee".desc)
val dfTop = input.withColumn("rn", row_number.over(wSpec)).where($"rn"===1).drop("rn") //Note: 'input' has my data 
dfTop.show()

您可以通过以下方式自己查看：

//Commenting drop()
val dfTop = input.withColumn("rn", row_number.over(wSpec)).where($"rn"===1) //.drop("rn") //Note: 'input' has my data 
dfTop.show()

dfTop.drop("rn").show()
//"rn" column is gone

你不能自己试试看吗？是的，我很快就会做的。只是想知道drop的内部结构。谢谢。问题是为什么首先需要临时列

rn

，