Json 如何获取位于ArrayType列中的字段名_Json_Scala_Apache Spark

Json 如何获取位于ArrayType列中的字段名

json scala apache-spark

Json 如何获取位于ArrayType列中的字段名,json,scala,apache-spark,Json,Scala,Apache Spark,这是我的模式 root |-- tags: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: struct (containsNull = true) | | | |-- context: string (nullable = true) | | | |-- key:

这是我的模式

    root
     |-- tags: array (nullable = true)
     |    |-- element: array (containsNull = true)
     |    |    |-- element: struct (containsNull = true)
     |    |    |    |-- context: string (nullable = true)
     |    |    |    |-- key: string (nullable = true)

我想获取元素上下文和键的名称，并将这些变量的数据类型更改为数组

当我尝试使用map获取字段时，它会显示类似这样的内容

arraydf.schema.fields.map(field1 =>
                println("FIELDS: "+field1)
Output: 
FIELDS:StructField(tags,ArrayType(ArrayType(StructType(StructField(context,StringType,true), StructField(key,StringType,true)),true),true),true)

我希望我的模式是这样的，struct类型下的元素应该是arrayType，我想要一种通用的方式。请帮帮我

    root
     |-- tags: array (nullable = true)
     |    |-- element: array (containsNull = true)
     |    |    |-- element: struct (containsNull = true)
     |    |    |    |-- context: array (nullable = true)
     |    |    |    |-- key: array (nullable = true)

据我所知，你只想访问一个元素，对吗？这是通过StructType的点表示法和ArrayType的getItem（或方括号[]）实现的

因此，如果您想获得这些值，让我说，请尝试：

arraydf.select（“标记[0][0]。上下文，标记[0][0]。键”）

我建议您也看看函数，它可能很有用。

结构上的模式匹配

import org.apache.spark.sql.types._
import org.apache.spark.sql.DataFrame

def fields(df: DataFrame, c: String) = df.schema(c) match{
  case StructField(_, ArrayType(ArrayType(ss: StructType, _), _), _, _) => 
    ss.fields map { s =>
      (s.name, s.dataType)
    }
}

例如：

scala> fields(Seq(Seq(Seq((1, 2)))).toDF, "value")
res7: Array[(String, org.apache.spark.sql.types.DataType)] = Array((_1,IntegerType), (_2,IntegerType))

你能提供一个输入/输出DFs的示例吗？@Pheeleppoo我给出了架构示例否？在我的数据框架中，我获取了一个嵌套的JSON文件，并将模式展平。最后我遇到了这个问题。