Scala 按多列动态分组_Scala_Apache Spark

Scala 按多列动态分组

scala apache-spark

Scala 按多列动态分组,scala,apache-spark,Scala,Apache Spark,我是spark的新手，我需要为一个数据帧分组多个列，如下图所示 root |-- Id: integer (nullable = true) |-- Traffic Volume Count Location Address: string (nullable = true) |-- Street: string (nullable = true) |-- Date of Count: string (nullable = true) |-- Total Passing Vehicle

我是spark的新手，我需要为一个数据帧分组多个列，如下图所示

root
 |-- Id: integer (nullable = true)
 |-- Traffic Volume Count Location Address: string (nullable = true)
 |-- Street: string (nullable = true)
 |-- Date of Count: string (nullable = true)
 |-- Total Passing Vehicle Volume: integer (nullable = true)
 |-- Vehicle Volume By Each Direction of Traffic: string (nullable = true)
 |-- Latitude: double (nullable = true)
 |-- Longitude: double (nullable = true)
 |-- Location: string (nullable = true)

我需要将两个栏分为

街道

和

总通过车辆量

，下面的代码如下所示：

trafficDf.groupBy("Street","Total Passing Vehicle Volume").count().orderBy("Street").show(100)

但问题是我需要执行分组的列有多少，我事先不知道这是一个运行时信息，我将作为json获取，我必须从json中提取我需要执行分组的列。

我知道我可以通过

createOrReplaceTempView

将我的

dataframe

转换成表，在那里我可以在上面运行SQL查询。但我想知道一定有什么方法不必编写SQL

我所知道的

df.select（）

我可以取哪个

expr（）

像：

df.select(expr("Id as new_Id, Street as new_Street")).show()

如果在

groupBy（）

中传递的是相同的内容，则会出现错误：

var dynamic_condition="Street, Total Passing Vehicle Volume" // this will be created from json where i'll get column names by looping through runtime info
trafficDf.groupBy(expr(dynamic_condition)).count().show()

错误：

mismatched input ',' expecting <EOF>(line 1, pos 6)

== SQL ==
Street, Total Passing Vehicle Volume

<代码>输入不匹配'，'预期（第1行，位置6） ==SQL== 街道，通过车辆总量我做错了，我已经检查了

groupBY（）

的文档，而且我认为它不能将

expr（）

作为参数，或者可能是参数。任何帮助都将受到感谢

注意：我知道在dataframe之上编写SQL查询是可能的，但我正在尝试其他方法

在上面的示例中，如果要将列列表作为

String

传递，则需要将其作为

list[String]

从API文档中

def groupBy(col1: String, cols: String*): RelationalGroupedDataset

下面显示了一个示例代码段


def dynamicGroup(df: DataFrame, cols: List[String] ): DataFrame = {
  df.groupBy(cols.head, cols.tail: _*)
}

你可以这样称呼它

val listOfStrings =  List("A", "B", "C")
val result = dynamicGroup(df, listOfStrings)

您可以点此：

val grpCols=dynamic\u condition.split（“，”）.map（c=>col（s“

”）

然后

df.groupBy（grpCols:\）