使用Spark加载XML时架构中存在重复字段

使用Spark加载XML时架构中存在重复字段,xml,scala,apache-spark,dataframe,Xml,Scala,Apache Spark,Dataframe,我想在此结构中创建一个架构: | |-- Features: struct (nullable = true) | | |-- Feature: array (nullable = true) | | | |-- element: string (containsNull = true) 这是我的代码: StructField( "Features", StructType( Array( StructField( "Fe

我想在此结构中创建一个架构:

|    |-- Features: struct (nullable = true)
|    |    |-- Feature: array (nullable = true)
|    |    |    |-- element: string (containsNull = true)
这是我的代码:

StructField( "Features", StructType(
        Array(
          StructField( "Feature", ArrayType(
            StructType(
              Array(
                StructField( "element", StringType, true )
              )
            )
          ) )
        )
      ), true )
结果:

|    |-- Features: struct (nullable = true)
|    |    |-- Feature: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- element: string (nullable = true)

有什么想法吗?

您应该省略最里面的
结构:

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val schema = StructType(Seq(StructField("Features", StructType(Seq(
  StructField("Feature", ArrayType(StringType))
)))))

spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema).printSchema
// root
//  |-- Features: struct (nullable = true)
//  |    |-- Feature: array (nullable = true)
//  |    |    |-- element: string (containsNull = true)