Scala 读取配置单元结构类型并修改值
我正在将配置单元表作为数据帧读取,并在新的数据集中检索它。我正在从结构类型读取特定的值(字符串),我想在将这些值存储到case类中之前格式化这些值Scala 读取配置单元结构类型并修改值,scala,apache-spark,hive,Scala,Apache Spark,Hive,我正在将配置单元表作为数据帧读取,并在新的数据集中检索它。我正在从结构类型读取特定的值(字符串),我想在将这些值存储到case类中之前格式化这些值 session.read .table(footWear) .select( $"id", $"footWearCategory".as("category"), concat_ws(",", $"listelements".getField("sneaker").getFiel
session.read
.table(footWear)
.select(
$"id",
$"footWearCategory".as("category"),
concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
.as[FootWearInformation]
case class FootWearInformation(id: String, category: String, availableColors: String)
例如:我将结构类型读为“liselements.sneaker.colors”,它返回一个数组,因为有几种颜色。在将它们存储到新数据集中之前,我希望颜色的格式如下:
session.read
.table(footWear)
.select(
$"id",
$"footWearCategory".as("category"),
concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
.as[FootWearInformation]
case class FootWearInformation(id: String, category: String, availableColors: String)
“红色”、“蓝色”、“黄色”(引号和逗号分隔)
session.read
.table(footWear)
.select(
$"id",
$"footWearCategory".as("category"),
concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
.as[FootWearInformation]
case class FootWearInformation(id: String, category: String, availableColors: String)
并存储为单个字符串
session.read
.table(footWear)
.select(
$"id",
$"footWearCategory".as("category"),
concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
.as[FootWearInformation]
case class FootWearInformation(id: String, category: String, availableColors: String)
concat_ws用逗号表示数组元素,但我还需要用双引号将它们括起来
session.read
.table(footWear)
.select(
$"id",
$"footWearCategory".as("category"),
concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
.as[FootWearInformation]
case class FootWearInformation(id: String, category: String, availableColors: String)
UDF:
session.read
.table(footWear)
.select(
$"id",
$"footWearCategory".as("category"),
concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
.as[FootWearInformation]
case class FootWearInformation(id: String, category: String, availableColors: String)
编写一个UDF,它接收一个数组并以所需格式给出一个字符串。如果您需要UDF方面的帮助,请发布一个示例数据集。谢谢您的建议。编写UDF解决了这个问题