Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
自定义项在Scala中按键过滤贴图_Scala_Apache Spark_Dataframe_Apache Spark Sql_User Defined Functions - Fatal编程技术网

自定义项在Scala中按键过滤贴图

自定义项在Scala中按键过滤贴图,scala,apache-spark,dataframe,apache-spark-sql,user-defined-functions,Scala,Apache Spark,Dataframe,Apache Spark Sql,User Defined Functions,我有一个具有以下模式的Spark数据帧: root |-- mapkey: map (nullable = true) | |-- key: string | |-- value: array (valueContainsNull = true) | | |-- element: struct (containsNull = true) | | | |-- id: string (nullable = true) | | |

我有一个具有以下模式的Spark数据帧:

root
 |-- mapkey: map (nullable = true)
 |    |-- key: string
 |    |-- value: array (valueContainsNull = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- id: string (nullable = true)
 |    |    |    |-- bt: string (nullable = true)
 |    |    |    |-- bp: double (nullable = true)
 |    |    |    |-- z: struct (nullable = true)
 |    |    |    |    |-- w: integer (nullable = true)
 |    |    |    |    |-- h: integer (nullable = true)
 |-- uid: string (nullable = true)
我想编写一个UDF来过滤mapkey,这样key就等于uid,并且只返回通过过滤器的值。我正在尝试以下方法:

val filterMap = udf((m: Map[String, Seq[Row]], uid: String) => {
    val s = Set(uid)
    m.filterKeys { s.contains(_) == true }
})
但我得到了以下错误:

java.lang.UnsupportedOperationException:不支持org.apache.spark.sql.Row类型的架构 位于org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:762) 位于org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:704) 在scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo处(TypeConstraints.scala:56) 位于org.apache.spark.sql.catalyst.ScalaReflection$class.cleanupreferenceobjects(ScalaReflection.scala:809) 位于org.apache.spark.sql.catalyst.ScalaReflection$.cleanupreferenceobjects(ScalaReflection.scala:39) 位于org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:703) 位于org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:722) 位于org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:704) 在scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo处(TypeConstraints.scala:56) 位于org.apache.spark.sql.catalyst.ScalaReflection$class.cleanupreferenceobjects(ScalaReflection.scala:809) 位于org.apache.spark.sql.catalyst.ScalaReflection$.cleanupreferenceobjects(ScalaReflection.scala:39) 位于org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:703) 位于org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:726) 位于org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:704) 在scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo处(TypeConstraints.scala:56) 位于org.apache.spark.sql.catalyst.ScalaReflection$class.cleanupreferenceobjects(ScalaReflection.scala:809) 位于org.apache.spark.sql.catalyst.ScalaReflection$.cleanupreferenceobjects(ScalaReflection.scala:39) 位于org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:703) 位于org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:700) 位于org.apache.spark.sql.functions$.udf(functions.scala:3200)


有人能指出UDF有什么问题吗?

看起来您唯一的选择是使用与此
行的内部结构匹配的case类:

case class MyStruct(w: Int, h: Int)
case class Element(id: String, bt: String, bp: Double, z: MyStruct)
然后,您可以在您的UDF中使用它(令人惊讶的是):


我一直无法使这种方法起作用。我总是收到一个“spark GenericRowWithSchema无法转换为XXX”错误。我发现将复杂类型传递给UDF的唯一方法是将其作为行进行传递。有趣的是,Spark处理复杂类型的方式可能发生了一些变化,这在Spark 2.3.0中适用,我想知道您使用的是哪个版本。如果您能使用您的版本了解上面的示例是否适用,我会很感兴趣。
// sample data:
val df = Seq(
  (Map(
    "key1" -> Array(Element("1", "bt1", 0.1, MyStruct(1, 2)), Element("11", "bt11", 0.2, MyStruct(1, 3))),
    "key2" -> Array(Element("2", "bt2", 0.2, MyStruct(12, 22)))
  ), "key2")
).toDF("mapkey", "uid")

df.printSchema() // prints the right schema, as expected in post

// define UDF:
val filterMap = udf((m: Map[String, Seq[Element]], uid: String) => {
  m.filterKeys(_ == uid)
})

// use UDF:
df.withColumn("result", filterMap($"mapkey", $"uid")).show(false)

// prints:
// +-----------------------------------------------------------------+
// |result                                                           |
// +-----------------------------------------------------------------+
// |Map(key1 -> WrappedArray([1,bt1,0.1,[1,2]], [11,bt11,0.2,[1,3]]))|
// +-----------------------------------------------------------------+