Apache spark &引用;ValueArray未声明";在包含哈希映射的数据集上使用spark映射函数时出现异常

Apache spark &引用;ValueArray未声明";在包含哈希映射的数据集上使用spark映射函数时出现异常,apache-spark,apache-spark-dataset,Apache Spark,Apache Spark Dataset,我有一节课 @Getter @Setter @NoArgsConstructor public class TestInput implements Serializable { private Map<String, String> key1; } 我尝试使用sparkmap函数读取和操作数据集 Dataset<TestInput> input = sparkSession.read().json(inputPath).as(Encoders.bean(Te

我有一节课


@Getter
@Setter
@NoArgsConstructor
public class TestInput implements Serializable {
    private Map<String, String> key1;
}
我尝试使用sparkmap函数读取和操作数据集

Dataset<TestInput> input = sparkSession.read().json(inputPath).as(Encoders.bean(TestInput.class));

Dataset<TestInput> output = input.map((MapFunction<TestInput, TestInput>) x -> x, Encoders.bean(TestInput.class));
但是map函数失败(也将其转换为javapojo)并出现错误

A method named "valueArray" is not declared in any enclosing class nor any supertype, nor through a static import
我做错了什么


此外,如果我将输入写回一个文件,我会返回我的原始json,因此它可能理解映射,但无法转换为所需的POJO。如果您查看
输入的模式,您将看到
key1
不是一个映射,而是一个字符串数组。这就是你看到错误的原因。将您的模式设置为映射,您就可以开始了

Scala代码

val expectedSchema = StructType(Seq(StructField("key1", MapType(StringType, StringType))))

val input = spark.read
  .schema(expectedSchema)
  .json(path)
  .as(Encoders.bean(classOf[TestInput]))

// key1 should be of map type
input.printSchema()

val out = input.map(
  (x: TestInput) => {
    x.setKey1((Map(("newkey", "newval")) ++ x.getKey1.asScala).asJava)
    x
  },
  Encoders.bean(classOf[TestInput]),
)

input.show(truncate = false)
out.show(truncate = false)
A method named "valueArray" is not declared in any enclosing class nor any supertype, nor through a static import
val expectedSchema = StructType(Seq(StructField("key1", MapType(StringType, StringType))))

val input = spark.read
  .schema(expectedSchema)
  .json(path)
  .as(Encoders.bean(classOf[TestInput]))

// key1 should be of map type
input.printSchema()

val out = input.map(
  (x: TestInput) => {
    x.setKey1((Map(("newkey", "newval")) ++ x.getKey1.asScala).asJava)
    x
  },
  Encoders.bean(classOf[TestInput]),
)

input.show(truncate = false)
out.show(truncate = false)