Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何将列表元素传递给concat函数?_Scala_List_Apache Spark_Dataframe_Concatenation - Fatal编程技术网

Scala 如何将列表元素传递给concat函数?

Scala 如何将列表元素传递给concat函数?,scala,list,apache-spark,dataframe,concatenation,Scala,List,Apache Spark,Dataframe,Concatenation,我目前正在使用以下方法对数据帧中的列进行连接: val Finalraw = raw.withColumn("primarykey", concat($"prod_id",$"frequency",$"fee_type_code")) 但问题是,我不想硬编码列,因为列的数量每次都在变化。我有一个由列名组成的列表: columnNames: List[String] = List("prod_id", "frequency", "fee_type_code") 因此,问题是如何将列表元素传递给

我目前正在使用以下方法对数据帧中的列进行连接:

val Finalraw = raw.withColumn("primarykey", concat($"prod_id",$"frequency",$"fee_type_code"))
但问题是,我不想硬编码列,因为列的数量每次都在变化。我有一个由列名组成的列表:

columnNames: List[String] = List("prod_id", "frequency", "fee_type_code")

因此,问题是如何将列表元素传递给
concat
函数,而不是硬编码列名?

当您有一个字符串列表时,
concat
函数将多个列作为输入。您需要转换列表以适应方法输入

首先,使用
map
将字符串转换为列对象,然后使用
解压列表:*
将参数正确传递给
concat

val Finalraw = raw.withColumn("primarykey", concat(columnNames.map(col):_*))

有关
:*
语法的解释,请参见当您有字符串列表时,
concat
函数将多列作为输入。您需要转换列表以适应方法输入

首先,使用
map
将字符串转换为列对象,然后使用
解压列表:*
将参数正确传递给
concat

val Finalraw = raw.withColumn("primarykey", concat(columnNames.map(col):_*))

有关
:*
语法的解释,请参见将列表元素映射到单独变量中的列表[org.apache.spark.sql.Column]。 看看这个

scala> val df = Seq(("a","x-","y-","z")).toDF("id","prod_id","frequency","fee_type_code")
df: org.apache.spark.sql.DataFrame = [id: string, prod_id: string ... 2 more fields]

scala> df.show(false)
+---+-------+---------+-------------+
|id |prod_id|frequency|fee_type_code|
+---+-------+---------+-------------+
|a  |x-     |y-       |z            |
+---+-------+---------+-------------+


scala> val arr = List("prod_id", "frequency", "fee_type_code")
arr: List[String] = List(prod_id, frequency, fee_type_code)

scala> val arr_col = arr.map(col(_))
arr_col: List[org.apache.spark.sql.Column] = List(prod_id, frequency, fee_type_code)

scala> df.withColumn("primarykey",concat(arr_col:_*)).show(false)
+---+-------+---------+-------------+----------+
|id |prod_id|frequency|fee_type_code|primarykey|
+---+-------+---------+-------------+----------+
|a  |x-     |y-       |z            |x-y-z     |
+---+-------+---------+-------------+----------+


scala>

将列表元素映射到单独变量中的list[org.apache.spark.sql.Column]。 看看这个

scala> val df = Seq(("a","x-","y-","z")).toDF("id","prod_id","frequency","fee_type_code")
df: org.apache.spark.sql.DataFrame = [id: string, prod_id: string ... 2 more fields]

scala> df.show(false)
+---+-------+---------+-------------+
|id |prod_id|frequency|fee_type_code|
+---+-------+---------+-------------+
|a  |x-     |y-       |z            |
+---+-------+---------+-------------+


scala> val arr = List("prod_id", "frequency", "fee_type_code")
arr: List[String] = List(prod_id, frequency, fee_type_code)

scala> val arr_col = arr.map(col(_))
arr_col: List[org.apache.spark.sql.Column] = List(prod_id, frequency, fee_type_code)

scala> df.withColumn("primarykey",concat(arr_col:_*)).show(false)
+---+-------+---------+-------------+----------+
|id |prod_id|frequency|fee_type_code|primarykey|
+---+-------+---------+-------------+----------+
|a  |x-     |y-       |z            |x-y-z     |
+---+-------+---------+-------------+----------+


scala>