Apache spark 未使用别名更新数据框列名

Apache spark 未使用别名更新数据框列名,apache-spark,apache-spark-sql,spark-dataframe,Apache Spark,Apache Spark Sql,Spark Dataframe,我正在对我创建的数据帧进行某种聚合。以下是步骤 val initDF = spark.read.format("csv").schema(someSchema).option("header","true").load(filePath).as[someCaseClass] var maleFemaleDistribution = initDF.select("DISTRICT","GENDER","ENROLMENT_ACCEPTED","ENROLMENT_REJECTED").group

我正在对我创建的数据帧进行某种聚合。以下是步骤

val initDF = spark.read.format("csv").schema(someSchema).option("header","true").load(filePath).as[someCaseClass]

var maleFemaleDistribution = initDF.select("DISTRICT","GENDER","ENROLMENT_ACCEPTED","ENROLMENT_REJECTED").groupBy("DISTRICT").agg(
     count( lit(1).alias("OVERALL_COUNT")),
     sum(when(col("GENDER") === "M", 1).otherwise(0).alias("MALE_COUNT")),
     sum(when(col("GENDER") === "F", 1).otherwise(0).alias("FEMALE_COUNT"))
      ).orderBy("DISTRICT")
但是,当我在新创建的DataFrame上执行printSchema时,我并没有将列名视为我提供的别名,而是将其显示出来

maleFemaleDistribution.printSchema
root
 |-- DISTRICT: string (nullable = true)
 |-- count(1 AS `OVERALL_COUNT`): long (nullable = false)
 |-- sum(CASE WHEN (GENDER = M) THEN 1 ELSE 0 END AS `MALE_COUNT`): long (nullable = true)
 |-- sum(CASE WHEN (GENDER = F) THEN 1 ELSE 0 END AS `FEMALE_COUNT`): long (nullable = true)
我希望列名在哪里

maleFemaleDistribution.printSchema
root
 |-- DISTRICT: string (nullable = true)
 |-- OVERALL_COUNT: long (nullable = false)
 |-- MALE_COUNT: long (nullable = true)
 |-- FEMALE_COUNT: long (nullable = true) 

我正在寻求帮助,以了解Alias在新DF中未更新的原因。我应该如何修改代码以反映Alias中提到的列名我还没有尝试运行该查询,但是应该是这样的

var maleFemaleDistribution = initDF.select("DISTRICT","GENDER","ENROLMENT_ACCEPTED","ENROLMENT_REJECTED").groupBy("DISTRICT").agg(
     count(lit(1)).alias("OVERALL_COUNT"),
     sum(when(col("GENDER") === "M", 1).otherwise(0)).alias("MALE_COUNT"),
     sum(when(col("GENDER") === "F", 1).otherwise(0)).alias("FEMALE_COUNT")
      ).orderBy("DISTRICT")

应在求和操作后添加别名函数。所以,不是这个,

sum(when(col("GENDER") === "M", 1).otherwise(0).alias("MALE_COUNT"))
应该是这样的:

sum(when(col("GENDER") === "M", 1).otherwise(0)).alias("MALE_COUNT")

谢谢高朗。但这是我正在做的事情。但是我没有看到我提到的列名。我在实际的帖子中提到过这一点。@RajeshRavindran这不完全是同一件事,你做过这个查询吗?+1谢谢Gaurang。我把化名放在总数里了。它应该在总数之后。谢谢你指出这一点。我对牙套的位置感到困惑。吸取教训,酷!请接受其中一个答案,如果它对你有帮助,请投票表决。