如何在pyspark中将JSON字符串转换为JSON对象
我有一个数据框的列类型是string,但实际上它包含了4模式的json对象,其中很少有公共字段。我需要把它转换成jason对象 以下是数据帧的模式: query.printSchema() DF的值看起来像 查询.显示(10) 我申请的解决方案是:如何在pyspark中将JSON字符串转换为JSON对象,json,pyspark,spark-dataframe,pyspark-sql,Json,Pyspark,Spark Dataframe,Pyspark Sql,我有一个数据框的列类型是string,但实际上它包含了4模式的json对象,其中很少有公共字段。我需要把它转换成jason对象 以下是数据帧的模式: query.printSchema() DF的值看起来像 查询.显示(10) 我申请的解决方案是: 写入文本文件 query.write.format(“text”).mode(“overwrite”).save(“s3://bucketname/temp/”) 读作json df=spark.read.json(“s3a://bucketname
有没有最好的方法,我不需要将数据帧作为文本文件写入,然后作为json文件再次读取,以获得预期的输出在写入文本文件之前,您可以使用_json()中的
,但您需要先定义模式
代码如下所示:
data=query.select(从json(“test”,schema=schema).别名(“value”).selectExpr(“value.*)
data.write.format(“text”).mode(“overwrite”).save(“s3://bucketname/temp/”)
在写入文本文件之前,可以使用来自_json()
的,但需要先定义模式
代码如下所示:
data=query.select(从json(“test”,schema=schema).别名(“value”).selectExpr(“value.*)
data.write.format(“text”).mode(“overwrite”).save(“s3://bucketname/temp/”)
您尝试过或可能尝试过该解决方案吗?您尝试过或可能尝试过该解决方案吗?
root
|-- test: string (nullable = true)
+--------------------+
| test|
+--------------------+
|{"PurchaseActivit...|
|{"PurchaseActivit...|
|{"PurchaseActivit...|
|{"Interaction":{"...|
|{"PurchaseActivit...|
|{"Interaction":{"...|
|{"PurchaseActivit...|
|{"PurchaseActivit...|
|{"PurchaseActivit...|
|{"PurchaseActivit...|
+--------------------+
only showing top 10 rows
root
|-- EventDate: string (nullable = true)
|-- EventId: string (nullable = true)
|-- EventNotificationType: long (nullable = true)
|-- Interaction: struct (nullable = true)
| |-- ContextId: string (nullable = true)
| |-- Created: string (nullable = true)
| |-- Description: string (nullable = true)
| |-- Id: string (nullable = true)
| |-- ModelContextId: string (nullable = true)
|-- PurchaseActivity: struct (nullable = true)
| |-- BillingCity: string (nullable = true)
| |-- BillingCountry: string (nullable = true)
| |-- ShippingAndHandlingAmount: double (nullable = true)
| |-- ShippingDiscountAmount: double (nullable = true)
| |-- SubscriberId: long (nullable = true)
| |-- SubscriptionOriginalEndDate: string (nullable = true)
|-- SubscriptionChurn: struct (nullable = true)
| |-- PaymentTypeCode: long (nullable = true)
| |-- PaymentTypeName: string (nullable = true)
| |-- PreviousPaidAmount: double (nullable = true)
| |-- SubscriptionRemoved: string (nullable = true)
| |-- SubscriptionStartDate: string (nullable = true)
|-- TransactionDetail: struct (nullable = true)
| |-- Amount: double (nullable = true)
| |-- OrderShipToCountry: string (nullable = true)
| |-- PayPalUserName: string (nullable = true)
| |-- PaymentSubTypeCode: long (nullable = true)
| |-- PaymentSubTypeName: string (nullable = true)