Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Json 未能将数据放入pyspark中所需的架构_Json_Apache Spark_Pyspark - Fatal编程技术网

Json 未能将数据放入pyspark中所需的架构

Json 未能将数据放入pyspark中所需的架构,json,apache-spark,pyspark,Json,Apache Spark,Pyspark,我有pyspark数据框架,如下所示 >>> df.show(1, False) {"data":{"probability":0.2345,"customerId":1234567,"region":"BR"},"uploadedDate":

我有pyspark数据框架,如下所示

>>> df.show(1, False)                                                           
{"data":{"probability":0.2345,"customerId":1234567,"region":"BR"},"uploadedDate":1542548806295} 
上面是当我没有传递任何模式作为输入时的输出

我正在尝试按照下面的脚本加载带有上述模式的数据

SCHEMA = StructType([ StructField('probabilityMale',LongType(),True),\
                    StructField('customerId',LongType(),True),\
                    StructField('region',StringType(),True),\
                    StructField('uploadedDate',StringType(),True)])

df = spark.read.format('csv').\
     option('header','false').\
     option('delimiter','\t').\
     schema(SCHEMA).\
     load(path)
但这并没有在单独的列中给出所有数据点。我还尝试了
inferSchema

df = spark.read.format('csv').\
     option('header','false').\
     option('delimiter','\t').\
     option("inferSchema", "true").\
     load(path)
但获得与前面提到的相同的输出


如何提及模式并在每列中包含数据?

您有一个JSON输入,应该使用JSON读取器而不是CSV读取器读取:

df = spark.read.json(path)
要单独获取列,可以展开结构
数据

df2 = df.select('data.*', 'uploadedDate')