Scala 从spark中的array.struct中删除字段

Scala 从spark中的array.struct中删除字段,scala,apache-spark,Scala,Apache Spark,我想从array.struct中删除一个字段,如下所示: case class myObj (id: String, item_value: String, delete: String) case class myObj2 (id: String, item_value: String) val df2=Seq ( ("1", "2","..100values", Seq(myObj ("A", "1a","1"),myObj ("B", "4r","2"))),

我想从array.struct中删除一个字段,如下所示:

 case class myObj (id: String, item_value: String, delete: String)
  case class myObj2 (id: String, item_value: String)

  val df2=Seq (
      ("1", "2","..100values", Seq(myObj ("A", "1a","1"),myObj ("B", "4r","2"))),
      ("1", "2","..100values", Seq(myObj ("X", "1p","11"),myObj ("V", "7w","8")))
  ).toDF("1","2","100fields","myArr")


val deleteColumn : (mutable.WrappedArray[myObj]=>mutable.WrappedArray[myObj2])= {
        (array: mutable.WrappedArray[myObj]) => array.map(o => myObj2(o.id, o.item_value))
      }
val myUDF3 = functions.udf(deleteColumn)
df2.withColumn("newArr",myUDF3($"myArr")).show(false)
错误很明显:

Exception in thread "main" org.apache.spark.SparkException: Failed to execute user defined function(anonfun$1: (array<struct<id:string,item_value:string,delete:string>>) => array<struct< id:string,item_value:string>>)
线程“main”org.apache.spark.sparkeexception中的异常:未能执行用户定义的函数(anonfun$1:(array)=>array>) 它不匹配,但我想做的是,从一个结构解析到另一个结构


我之所以使用UDF,是因为df.map()不适合映射特定列,它强制指示所有列。因此,我没有找到将此映射应用于一列的最佳方法。

您可以重写
UDF
,该UDF采用
行,而不是如下所示的自定义对象

val deleteColumn = udf((value: Seq[Row]) => {
  value.map(row => MyObj2(row.getString(0), row.getString(1)))
})

df2.withColumn("newArr", deleteColumn($"myArr"))
输出:

+---+---+-----------+---------------------+----------------+
|1  |2  |100fields  |myArr                |newArr          |
+---+---+-----------+---------------------+----------------+
|1  |2  |..100values|[[A,1a,1], [B,4r,2]] |[[A,1a], [B,4r]]|
|1  |2  |..100values|[[X,1p,11], [V,7w,8]]|[[X,1p], [V,7w]]|
+---+---+-----------+---------------------+----------------+

返回一个错误:没有可用于Seq[MyObj2]val deleteColumn=f.udf((值:Seq[Row])=>{@MrElefant是您的案例类,并且
Seq[MyObj2]
?我想您的案例类具有
Seq[MyObj2]
正是@Shankar错误是与Seq[MyObj2]有关的,抱歉,我修改了名称。让我们来看看。