Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在pyspark中将JSON字符串转换为JSON对象_Json_Pyspark_Spark Dataframe_Pyspark Sql - Fatal编程技术网

如何在pyspark中将JSON字符串转换为JSON对象

如何在pyspark中将JSON字符串转换为JSON对象,json,pyspark,spark-dataframe,pyspark-sql,Json,Pyspark,Spark Dataframe,Pyspark Sql,我有一个数据框的列类型是string,但实际上它包含了4模式的json对象,其中很少有公共字段。我需要把它转换成jason对象 以下是数据帧的模式: query.printSchema() DF的值看起来像 查询.显示(10) 我申请的解决方案是: 写入文本文件 query.write.format(“text”).mode(“overwrite”).save(“s3://bucketname/temp/”) 读作json df=spark.read.json(“s3a://bucketname

我有一个数据框的列类型是string,但实际上它包含了4模式的json对象,其中很少有公共字段。我需要把它转换成jason对象

以下是数据帧的模式:

query.printSchema()

DF的值看起来像

查询.显示(10)

我申请的解决方案是:

  • 写入文本文件
  • query.write.format(“text”).mode(“overwrite”).save(“s3://bucketname/temp/”)

  • 读作json
  • df=spark.read.json(“s3a://bucketname/temp/”)

  • 现在打印模式,它是已经转换为json对象的每一行的json字符串
  • df.printSchema()


    有没有最好的方法,我不需要将数据帧作为文本文件写入,然后作为json文件再次读取,以获得预期的输出

    在写入文本文件之前,您可以使用_json()中的
    ,但您需要先定义模式

    代码如下所示:

    data=query.select(从json(“test”,schema=schema).别名(“value”).selectExpr(“value.*)


    data.write.format(“text”).mode(“overwrite”).save(“s3://bucketname/temp/”)
    在写入文本文件之前,可以使用来自_json()
    ,但需要先定义模式

    代码如下所示:

    data=query.select(从json(“test”,schema=schema).别名(“value”).selectExpr(“value.*)


    data.write.format(“text”).mode(“overwrite”).save(“s3://bucketname/temp/”)

    您尝试过或可能尝试过该解决方案吗?您尝试过或可能尝试过该解决方案吗?
    root
     |-- test: string (nullable = true)
    
    +--------------------+
    |                test|
    +--------------------+
    |{"PurchaseActivit...|
    |{"PurchaseActivit...|
    |{"PurchaseActivit...|
    |{"Interaction":{"...|
    |{"PurchaseActivit...|
    |{"Interaction":{"...|
    |{"PurchaseActivit...|
    |{"PurchaseActivit...|
    |{"PurchaseActivit...|
    |{"PurchaseActivit...|
    +--------------------+
    only showing top 10 rows
    
    root
     |-- EventDate: string (nullable = true)
     |-- EventId: string (nullable = true)
     |-- EventNotificationType: long (nullable = true)
     |-- Interaction: struct (nullable = true)
     |    |-- ContextId: string (nullable = true)
     |    |-- Created: string (nullable = true)
     |    |-- Description: string (nullable = true)
     |    |-- Id: string (nullable = true)
     |    |-- ModelContextId: string (nullable = true)
     |-- PurchaseActivity: struct (nullable = true)
     |    |-- BillingCity: string (nullable = true)
     |    |-- BillingCountry: string (nullable = true)
     |    |-- ShippingAndHandlingAmount: double (nullable = true)
     |    |-- ShippingDiscountAmount: double (nullable = true)
     |    |-- SubscriberId: long (nullable = true)
     |    |-- SubscriptionOriginalEndDate: string (nullable = true)
     |-- SubscriptionChurn: struct (nullable = true)
     |    |-- PaymentTypeCode: long (nullable = true)
     |    |-- PaymentTypeName: string (nullable = true)
     |    |-- PreviousPaidAmount: double (nullable = true)
     |    |-- SubscriptionRemoved: string (nullable = true)
     |    |-- SubscriptionStartDate: string (nullable = true)
     |-- TransactionDetail: struct (nullable = true)
     |    |-- Amount: double (nullable = true)
     |    |-- OrderShipToCountry: string (nullable = true)
     |    |-- PayPalUserName: string (nullable = true)
     |    |-- PaymentSubTypeCode: long (nullable = true)
     |    |-- PaymentSubTypeName: string (nullable = true)