Apache spark 在Spark数据集中的其他2列中添加一列_Apache Spark_Apache Spark Sql

Apache spark 在Spark数据集中的其他2列中添加一列

apache-spark

Apache spark 在Spark数据集中的其他2列中添加一列,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我在spark中有一个数据集，就像： +----+-------+ | age| name| +----+-------+ | 15|Michael| | 30| Andy| | 19| Justin| +----+-------+ 现在我想添加一个列，该列的值为字符串值age加上字符串值name，如下所示： +----+-------+-----------+ | age| name|cbdkey | +----+-------+-----------+ | 15

我在spark中有一个

数据集

，就像：

+----+-------+
| age|   name|
+----+-------+
|  15|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+

现在我想添加一个列，该列的值为字符串值

age

加上字符串值

name

，如下所示：

+----+-------+-----------+
| age|   name|cbdkey     |
+----+-------+-----------+
|  15|Michael|  15Michael|
|  30|   Andy|  30Andy   |
|  19| Justin|  19Justin |
+----+-------+-----------+

我使用：

df.withColumn("cbdkey",col("age").+(col("name"))).show()

但是新列

cbdkey

的所有值都是

null

。那么，我应该怎么做呢？提前谢谢。

您可以使用

concat

功能：

df.withColumn("cbdkey", concat(col("age"), col("name"))).show
+---+-------+---------+
|age|   name|   cbdkey|
+---+-------+---------+
| 15|Michael|15Michael|
| 30|   Andy|   30Andy|
| 19| Justin| 19Justin|
+---+-------+---------+

如果需要指定自定义分隔符，请使用

concat\u ws

：

df.withColumn("cbdkey", concat_ws(",", col("age"), col("name"))).show
+---+-------+----------+
|age|   name|    cbdkey|
+---+-------+----------+
| 15|Michael|15,Michael|
| 30|   Andy|   30,Andy|
| 19| Justin| 19,Justin|
+---+-------+----------+

您可以使用

concat

功能：

df.withColumn("cbdkey", concat(col("age"), col("name"))).show
+---+-------+---------+
|age|   name|   cbdkey|
+---+-------+---------+
| 15|Michael|15Michael|
| 30|   Andy|   30Andy|
| 19| Justin| 19Justin|
+---+-------+---------+

如果需要指定自定义分隔符，请使用

concat\u ws

：

df.withColumn("cbdkey", concat_ws(",", col("age"), col("name"))).show
+---+-------+----------+
|age|   name|    cbdkey|
+---+-------+----------+
| 15|Michael|15,Michael|
| 30|   Andy|   30,Andy|
| 19| Justin| 19,Justin|
+---+-------+----------+

另一种方法是在数据帧上编写一个UDF（用户定义函数）调用

val concatUDF = udf {
  (age: Int, name: String) => {
    age + name
  }
}

df.withColumn("cbdkey", concatUDF(col("age"), col("name"))).show()

输出：

+---+-------+---------+
|age|   name|   cbdkey|
+---+-------+---------+
| 15|Michael|15Michael|
| 30|   Andy|   30Andy|
| 19| Justin| 19Justin|
+---+-------+---------+

另一种方法是在数据帧上编写一个UDF（用户定义函数）调用

val concatUDF = udf {
  (age: Int, name: String) => {
    age + name
  }
}

df.withColumn("cbdkey", concatUDF(col("age"), col("name"))).show()

输出：

+---+-------+---------+
|age|   name|   cbdkey|
+---+-------+---------+
| 15|Michael|15Michael|
| 30|   Andy|   30Andy|
| 19| Justin| 19Justin|
+---+-------+---------+

谢谢你，你的答案有效。谢谢你，你的答案有效。这里不需要。Spark SQL支持这里不需要的

concat

和

concat\u ws

。Spark SQL支持

concat

和

concat\u ws