Java 如何在spark数据集中向字符串数组列添加字符串_Java_Apache Spark_Apache Spark Sql_Apache Spark 2.0

Java 如何在spark数据集中向字符串数组列添加字符串

java apache-spark

Java 如何在spark数据集中向字符串数组列添加字符串,java,apache-spark,apache-spark-sql,apache-spark-2.0,Java,Apache Spark,Apache Spark Sql,Apache Spark 2.0,我有一个“数据集（行）”如下 +-----+--------------+ |val | history | +-----+--------------+ |500 |[a=456, a=500]| |800 |[a=456, a=500]| |784 |[a=456, a=500]| +-----+--------------+ 这里val是“String”，history是“String数组”。我正在尝试将val列中的内容添加到history列，以便我的数据集看起来像：

我有一个“数据集（行）”如下

+-----+--------------+
|val  |  history     |
+-----+--------------+
|500  |[a=456, a=500]|
|800  |[a=456, a=500]|
|784  |[a=456, a=500]|
+-----+--------------+

这里val是“String”，history是“String数组”。我正在尝试将val列中的内容添加到history列，以便我的数据集看起来像：

+-----+---------------------+
|val  |  history            |
+-----+---------------------+
|500  |[a=456, b=500, c=500]|
|800  |[a=456, b=500, c=800]|
|784  |[a=456, b=500, c=784]|
+-----+---------------------+

这里讨论了一个类似的问题，但我不知道scala，也无法创建类似的java解决方案

请帮助我在java中实现这一点

我编写了一个解决方案，但我不确定它是否可以进一步优化

    dataset.map(row -> {
        Seq<String> seq = row.getAs("history");
        ArrayList<String> list = new ArrayList<>(JavaConversions.seqAsJavaList(seq));
        list.add("c="+row.getAs("val"));

        return RowFactory.create(row.getAs("val"),list.toArray(new String[0]));},schema);

dataset.map（行->{
Seq Seq=行。getAs（“历史”）；
ArrayList=newArrayList（JavaConversions.seqAsJavaList（seq））；
list.add（“c=“+row.getAs（“val”））；
返回RowFactory.create（row.getAs（“val”），list.toArray（新字符串[0]）；}，schema）；

在Spark 2.4（不是之前的版本）中，您可以使用

concat

功能来concat两个阵列。在您的情况下，您可以执行以下操作：

df.withColumn（“val2”），concat（lit（“c=”），col（“val”））
。选择（concat（col（“history”）、数组（col（“val2”））；

注意：我第一次使用的是concat字符串，第二次使用的是concat数组。

array（col（“val2”））

创建一个元素数组。

你确定这是一个字符串数组吗？为什么它有键？它只是一种方便的格式。但它是一个字符串。