Sql 阿格。带火花中的过滤器和groupby
我正在基于groupBy条件进行聚合,并对现有的spark/scala数据帧应用一些过滤器。但在执行代码时,我得到了“无法解析”Sql 阿格。带火花中的过滤器和groupby,sql,apache-spark,pyspark,apache-spark-sql,pyspark-dataframes,Sql,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Dataframes,我正在基于groupBy条件进行聚合,并对现有的spark/scala数据帧应用一些过滤器。但在执行代码时,我得到了“无法解析”标志“给定的输入列:” 有人能指导我如何重写代码吗 val someDF = Seq( (1, 111,100,100,"C","5th","Y",11), (1, 111,100,100,"C","5th","Y",11)
标志
“给定的输入列:”
有人能指导我如何重写代码吗
val someDF = Seq(
(1, 111,100,100,"C","5th","Y",11),
(1, 111,100,100,"C","5th","Y",11),
(2, 222,200,200,"C","5th","Y",22),
(2, 222,200,200,"C","5th","Y",22)
).toDF("id","rollno","sub1","sub2","flag","class","status","sno")
var df2 = someDF.groupBy("id","rollno")
.agg(sum("sub1").alias("sub1"),sum("sub2").alias("sub2"))
.filter(col("flag") === "C")
.filter(length(col("rollno")) >= 2)
.filter(col("class") === ("5th") || col("class") === ("6th"))
.filter(substring(col("rollno"), 1, 2) === col("sno"))
.filter(col("status") === "Y")
.select("id", "rollno", "sub1", "sub2", "flag", "class", "sno", "status")
Error:
org.apache.spark.sql.AnalysisException: cannot resolve '`flag`' given input columns: [id, rollno, sub1, sub2];;
'Filter ('flag = C)
Expected Result:
+---+------+----+----+----+-----+------+---+
| id|rollno|sub1|sub2|flag|class|status|sno|
+---+------+----+----+----+-----+------+---+
| 1| 111| 200| 200| C| 5th| Y| 11|
| 2| 222| 400| 400| C| 5th| Y| 22|
+---+------+----+----+----+-----+------+---+
聚合后,其他列已消失,因此无法基于这些列进行筛选。您需要在分组之前进行筛选。如果要保留其他列,还需要按其他列分组
var df2 = someDF
.filter(col("flag") === "C")
.filter(length(col("rollno")) >= 2)
.filter(col("class") === ("5th") || col("class") === ("6th"))
.filter(substring(col("rollno"), 1, 2) === col("sno"))
.filter(col("status") === "Y")
.groupBy("id", "rollno", "flag", "class", "sno", "status")
.agg(sum("sub1").alias("sub1"),sum("sub2").alias("sub2"))
.select("id", "rollno", "sub1", "sub2", "flag", "class", "sno", "status")
df2.show
+---+------+----+----+----+-----+---+------+
| id|rollno|sub1|sub2|flag|class|sno|status|
+---+------+----+----+----+-----+---+------+
| 1| 111| 200| 200| C| 5th| 11| Y|
| 2| 222| 400| 400| C| 5th| 22| Y|
+---+------+----+----+----+-----+---+------+
分组前应用筛选器