Scala Spark内部连接并获取最小值()
我无法正确联接并获取结果列,需要在联接后获取列的min()Scala Spark内部连接并获取最小值(),scala,apache-spark,Scala,Apache Spark,我无法正确联接并获取结果列,需要在联接后获取列的min() SELECT t.ad, t.DId, t.BY, t.BM, t.cid, MIN(p.PS) AS PS FROM Tempity t inner join ples p on t.cid = p.cid and p.PType = t.TeO AND p.pto = 'cccc' AND p.cid = 2 GROUP BY t.aid ,t.DId ,t
SELECT
t.ad,
t.DId,
t.BY,
t.BM,
t.cid,
MIN(p.PS) AS PS
FROM
Tempity t inner join ples p
on t.cid = p.cid
and p.PType = t.TeO
AND p.pto = 'cccc'
AND p.cid = 2
GROUP BY t.aid
,t.DId
,t.BYear
,t.BM
,t.cid;
I am converting above sql query as
val RS = Tempity.join(DF_LES,Tempity("cid") <=> DF_PLES("cid")&& DF_PLES("clientid") <=> 2 && Tempity("TO") <=> DF_LES("PType") && DF_LES("pto") <=> "cccc" ,"inner").select("aid","DId","BM","BY","TO","cid").groupBy(aid","DId","BM","BY")select("aid","DId","BM","BY","TO","cid").show
使用tempaty(“cid”)
而不是cid
,因为它不明确
import org.apache.spark.sql.functions._ //for min()
val RS = Tempity.join(DF_LES,
Tempity("cid") <=> DF_PLES("cid") &&
DF_PLES("clientid") <=> 2 &&
Tempity("TO") <=> DF_PLES("PType") &&
DF_PLES("pto") <=> "cccc",
"inner"
)
.groupBy("aid","DId","BM","BY", Tempity("cid"))
.agg(min(DF_PLES("PS")))
RS.show()
:49:错误:value select不是org.apache.spark.sql的成员。RelationalGroupedDataset
为val RS=tempy.join(DF_LES,tempy(“cid”)DF-PLES(“cid”)和&DF-PLES(“clientid”)2和&tempy(“TO”)DF-PLES(“PType”)和&DF-PLES(“pto”)“cccc”,“inner”)引发错误.groupBy(“aid、DId、BM、BY、TEMPTY(“cid”)) .agg(min(DF_PLES(“PS”))
我想在DF上执行操作,而不是使用sql(“”)。在从表的!!如果可能,通过编辑数据帧代码,在应答中添加数据帧代码。
import org.apache.spark.sql.functions._ //for min()
val RS = Tempity.join(DF_LES,
Tempity("cid") <=> DF_PLES("cid") &&
DF_PLES("clientid") <=> 2 &&
Tempity("TO") <=> DF_PLES("PType") &&
DF_PLES("pto") <=> "cccc",
"inner"
)
.groupBy("aid","DId","BM","BY", Tempity("cid"))
.agg(min(DF_PLES("PS")))
RS.show()
val spark: SparkSession = SparkSession.builder.master("local").getOrCreate;
//create tables from DataFrames
Tempity.createOrReplaceTempView("Tempity")
DF_PLES.createOrReplaceTempView("ples")
import spark.sql
//Now run the same SQL
sql("""
SELECT t.ad, t.DId, t.BY, t.BM, t.cid, MIN(p.PS) AS PS
FROM Tempity t
INNER JOIN ples p
ON t.cid = p.cid AND p.PType = t.TeO AND p.pto = 'cccc' AND p.cid = 2
GROUP BY t.ad, t.DId, t.BY, t.BM, t.cid
""")