Scala 从spark中的array.struct中删除字段
我想从array.struct中删除一个字段,如下所示:Scala 从spark中的array.struct中删除字段,scala,apache-spark,Scala,Apache Spark,我想从array.struct中删除一个字段,如下所示: case class myObj (id: String, item_value: String, delete: String) case class myObj2 (id: String, item_value: String) val df2=Seq ( ("1", "2","..100values", Seq(myObj ("A", "1a","1"),myObj ("B", "4r","2"))),
case class myObj (id: String, item_value: String, delete: String)
case class myObj2 (id: String, item_value: String)
val df2=Seq (
("1", "2","..100values", Seq(myObj ("A", "1a","1"),myObj ("B", "4r","2"))),
("1", "2","..100values", Seq(myObj ("X", "1p","11"),myObj ("V", "7w","8")))
).toDF("1","2","100fields","myArr")
val deleteColumn : (mutable.WrappedArray[myObj]=>mutable.WrappedArray[myObj2])= {
(array: mutable.WrappedArray[myObj]) => array.map(o => myObj2(o.id, o.item_value))
}
val myUDF3 = functions.udf(deleteColumn)
df2.withColumn("newArr",myUDF3($"myArr")).show(false)
错误很明显:
Exception in thread "main" org.apache.spark.SparkException: Failed to execute user defined function(anonfun$1: (array<struct<id:string,item_value:string,delete:string>>) => array<struct< id:string,item_value:string>>)
线程“main”org.apache.spark.sparkeexception中的异常:未能执行用户定义的函数(anonfun$1:(array)=>array>)
它不匹配,但我想做的是,从一个结构解析到另一个结构
我之所以使用UDF,是因为df.map()不适合映射特定列,它强制指示所有列。因此,我没有找到将此映射应用于一列的最佳方法。您可以重写
UDF
,该UDF采用行,而不是如下所示的自定义对象
val deleteColumn = udf((value: Seq[Row]) => {
value.map(row => MyObj2(row.getString(0), row.getString(1)))
})
df2.withColumn("newArr", deleteColumn($"myArr"))
输出:
+---+---+-----------+---------------------+----------------+
|1 |2 |100fields |myArr |newArr |
+---+---+-----------+---------------------+----------------+
|1 |2 |..100values|[[A,1a,1], [B,4r,2]] |[[A,1a], [B,4r]]|
|1 |2 |..100values|[[X,1p,11], [V,7w,8]]|[[X,1p], [V,7w]]|
+---+---+-----------+---------------------+----------------+
返回一个错误:没有可用于Seq[MyObj2]val deleteColumn=f.udf((值:Seq[Row])=>{@MrElefant是您的案例类,并且Seq[MyObj2]
?我想您的案例类具有Seq[MyObj2]
正是@Shankar错误是与Seq[MyObj2]有关的,抱歉,我修改了名称。让我们来看看。