pyspark中嵌套Json文件的Structtype定义_Json_Pyspark Sql

pyspark中嵌套Json文件的Structtype定义

json

pyspark中嵌套Json文件的Structtype定义,json,pyspark-sql,Json,Pyspark Sql,我有一个json文件，我在pyspark中为它创建了一个数据帧以下是json文件内容： {"xyz": [ {"c1": "a", "c2": "b", "c3": "d"}]} 这是我创建的Structype模式 schema = StructType([ StructField("abc",ArrayType( StructType([ StructField("c1",StringType()), StructField("c2",StringType()), StructField(

我有一个json文件，我在pyspark中为它创建了一个数据帧

以下是json文件内容：

{"xyz": [ {"c1": "a", "c2": "b", "c3": "d"}]}

这是我创建的Structype模式

schema = StructType([
StructField("abc",ArrayType(
StructType([
StructField("c1",StringType()),
StructField("c2",StringType()),
StructField("c3",StringType())])))])

```

我的问题是为什么c1、c2、c3列在输出中显示为数组？如何将这些列c1、c2、c3设置为Stringtype？这样在输出中，方括号就消失了

rdd = sc.textFile(path).map(lambda x: x.encode("ascii", "ignore")).map(lambda line: json.loads(line))
df = rdd.toDF(schema=schema)`

df_colsexp = df.select(
col('cards.c1').alias('c1'),
col('cards.c2').alias('c2'),
col('cards.c3').alias('c3')
)
df_colsexp.show(5,False)`

df_colsexp.printSchema()

```
>>> df_colsexp.printSchema()                                                                                                                                                      
root                                                                                                                                                                              
 |-- c1: array (nullable = true)                                                                                                                                                  
 |    |-- element: string (containsNull = true)                                                                                                                                   
 |-- c2: array (nullable = true)                                                                                                                                     
 |    |-- element: string (containsNull = true)                                                                                                                                   
 |-- c3: array (nullable = true)                                                                                                                                          
 |    |-- element: string (containsNull = true)                                                                                                                                   


df_colsexp.show(5,False)

Output:
c1   c2  c3
[a] [b] [c]