Scala 读取配置单元结构类型并修改值

Scala 读取配置单元结构类型并修改值,scala,apache-spark,hive,Scala,Apache Spark,Hive,我正在将配置单元表作为数据帧读取,并在新的数据集中检索它。我正在从结构类型读取特定的值(字符串),我想在将这些值存储到case类中之前格式化这些值 session.read .table(footWear) .select( $"id", $"footWearCategory".as("category"), concat_ws(",", $"listelements".getField("sneaker").getFiel

我正在将配置单元表作为数据帧读取,并在新的数据集中检索它。我正在从结构类型读取特定的值(字符串),我想在将这些值存储到case类中之前格式化这些值

session.read
      .table(footWear)
      .select(
        $"id",
        $"footWearCategory".as("category"),
        concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
      .as[FootWearInformation]


case class FootWearInformation(id: String, category: String, availableColors: String)
例如:我将结构类型读为“liselements.sneaker.colors”,它返回一个数组,因为有几种颜色。在将它们存储到新数据集中之前,我希望颜色的格式如下:

session.read
      .table(footWear)
      .select(
        $"id",
        $"footWearCategory".as("category"),
        concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
      .as[FootWearInformation]


case class FootWearInformation(id: String, category: String, availableColors: String)
“红色”、“蓝色”、“黄色”(引号和逗号分隔)

session.read
      .table(footWear)
      .select(
        $"id",
        $"footWearCategory".as("category"),
        concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
      .as[FootWearInformation]


case class FootWearInformation(id: String, category: String, availableColors: String)
并存储为单个字符串

session.read
      .table(footWear)
      .select(
        $"id",
        $"footWearCategory".as("category"),
        concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
      .as[FootWearInformation]


case class FootWearInformation(id: String, category: String, availableColors: String)
concat_ws用逗号表示数组元素,但我还需要用双引号将它们括起来

session.read
      .table(footWear)
      .select(
        $"id",
        $"footWearCategory".as("category"),
        concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
      .as[FootWearInformation]


case class FootWearInformation(id: String, category: String, availableColors: String)
UDF:

session.read
      .table(footWear)
      .select(
        $"id",
        $"footWearCategory".as("category"),
        concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
      .as[FootWearInformation]


case class FootWearInformation(id: String, category: String, availableColors: String)

编写一个UDF,它接收一个数组并以所需格式给出一个字符串。如果您需要UDF方面的帮助,请发布一个示例数据集。谢谢您的建议。编写UDF解决了这个问题