Apache spark 将sql查询转换为等效的spark查询

Apache spark 将sql查询转换为等效的spark查询,apache-spark,java-8,apache-spark-sql,apache-spark-dataset,Apache Spark,Java 8,Apache Spark Sql,Apache Spark Dataset,我将spark-sql-2.4.1v与java8一起使用 我有下面这样的场景/片段 Dataset<Row> df =//loaded data from a csv file // this has columns like "code1","code2","code3","code4","code5","code6", and "class"

我将
spark-sql-2.4.1v
与java8一起使用

我有下面这样的场景/片段

Dataset<Row> df =//loaded data from a csv file
// this has columns like "code1","code2","code3","code4","code5","code6", and "class"
 
 df.createOrReplaceTempView("temp_tab");

List<String> codesList = Arrays.asList("code1","code5"); // codes of interest to be calculated.
 
 codesList.stream().forEach( code -> {
 
 String query = "select " 
                                  + " avg(" + code + ") as mean, "
                                  + "percentile(" + code +",0.25) as p25" 
                                  + "from " + temp_tab                      
                                  + " group by class";
                
  Dataset<Row> resultDs  = sparkSession.sql(query);
 });
Dataset df=//从csv文件加载数据
//它有“代码1”、“代码2”、“代码3”、“代码4”、“代码5”、“代码6”和“类”等列
df.createOrReplaceTempView(“临时选项卡”);
List codesList=Arrays.asList(“code1”、“code5”);//待计算的相关代码。
codesList.stream().forEach(代码->{
String query=“选择”
+“平均值(“+代码+”)作为平均值,”
+百分位(“+代码+”,0.25)为第25页
+“从”+临时选项卡
+“按班级分组”;
数据集resultDs=sparkSession.sql(查询);
});

如何使用functions.expr()和functions.agg()编写此代码?

诸如此类的东西<代码>df.groupBy(“class”).agg(平均值(“代码”).alias(“平均值”),百分位数(“代码”,0.25)。别名(“p25”)@blackbishop没有调用任何函数percentile@CostiCiudatu在上述情况下,如何使用reduce/map函数收集resultDs数据集?您可以将其与
expr
一起使用:
expr(“百分位(代码,0.25)”)。别名(“p25”)
@blackishop在给定输入列中显示“
code
”时出错:…这里我需要从codesList,即codesList.stream().forEach(代码->{