Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark无法从';地图类型';_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

Scala Spark无法从';地图类型';

Scala Spark无法从';地图类型';,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我正在使用以下缓冲区模式编写udaf: bufferSchema: StructType = StructType( StructField("grades", MapType(StructType(StructField("subject", StringType) :: StructField("subject_type", StringType) :: Nil), ArrayType(StructType(StructField("date", LongType) ::

我正在使用以下缓冲区模式编写udaf:

bufferSchema: StructType = StructType(
    StructField("grades", MapType(StructType(StructField("subject", StringType) :: StructField("subject_type", StringType) :: Nil),
      ArrayType(StructType(StructField("date", LongType) :: StructField("grade", IntegerType) :: Nil)))) :: Nil)
看起来spark内部将密钥类型解释为GenericRowWithSchema,而不是simple(String,String)。 因此,每当我试图从地图上画:

  override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {

var buffer_scoresMap = buffer.getAs[Map[(String,String), Array[..]](0)
buffer\u scoresMap.get((“k1”,“k2”))
返回None即使这个键肯定在映射中,我甚至在调试中看到它。 我尝试将键变异到
GenericRowWithSchema
,然后返回到
(String,String)
,然后从地图上获取,但没有成功


有什么想法吗?

事实上,元组被转换为结构,而不是转换回元组,当它们是深度嵌套列的一部分时。换句话说,
buffer\u scoresMap
实际上具有类型
Map[Row,Array[…]]]
,因此您可以创建一个
从中提取项目:

var buffer_scoresMap = buffer.getAs[Map[Row, Array[..]](0)
buffer_scoresMap.get(Row("k1","k2")) // should not be None if key exists
这里有一个简短的例子可以证明这一点:

// create a simple DF with similar schema: 
case class Record(grades: Map[(String, String), Array[Int]])
val df = sc.parallelize(Seq(Record(Map(("a", "b") -> Array(1, 2))))).toDF("grades")

// this indeed fails:
df.rdd.map(r => r.getAs[Map[(String, String), Array[Int]]](0).get(("a", "b"))).first() // None

// but this works:
df.rdd.map(r => r.getAs[Map[Row, Array[Int]]](0).get(Row("a", "b"))).first() // Some(WrappedArray(1, 2))