Python 皮斯帕克。如何创建像这样的df.shema?
我试图创建如下模式:Python 皮斯帕克。如何创建像这样的df.shema?,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我试图创建如下模式: root |-- _ehid: string (nullable = true) |-- duration: double (nullable = true) |-- list: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: string (containsNull = true) |-- request.id: strin
root
|-- _ehid: string (nullable = true)
|-- duration: double (nullable = true)
|-- list: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: string (containsNull = true)
|-- request.id: string (nullable = true)
但我只能创建一个:
root
|-- _ehid: string (nullable = true)
|-- duration: double (nullable = true)
|-- list: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- element: string (nullable = true)
|-- request.id: string (nullable = true)
我看到元素的结构类型而不是数组。当我试图通过df.show(10)查看我的df时,我只看到空值
我的剧本:
schema = StructType([
StructField("_ehid", StringType(), True),
StructField("duration", DoubleType(), True),
StructField("list", ArrayType(StructType([
StructField("element", StringType())
])), True),
StructField("request.id", StringType(), True)])
直接使用
StringType
:
schema = StructType([
StructField("_ehid", StringType(), True),
StructField("duration", DoubleType(), True),
StructField("list", ArrayType(ArrayType(StringType())), True),
StructField("request.id", StringType(), True)])