如何更新增量表pyspark/hive中的嵌套json_数据
我想将col4值更新为TT,如如何更新增量表pyspark/hive中的嵌套json_数据,pyspark,apache-spark-sql,pyspark-dataframes,Pyspark,Apache Spark Sql,Pyspark Dataframes,我想将col4值更新为TT,如“col4”:“TT” 我尝试了以下代码: (Databricks) %sql select * from df2 jsonData -------- {"col1":"AA","col2":"BB","col3":"CC","col4":"DD"} 及 获取以下错误: update df2 set j
“col4”:“TT”
我尝试了以下代码:
(Databricks)
%sql
select * from df2
jsonData
--------
{"col1":"AA","col2":"BB","col3":"CC","col4":"DD"}
及
获取以下错误:
update df2 set jsonData = JSON_MODIFY(jsonData '$.col4', 'TT')
使用
from_json
函数将json展平到列中,然后更新col4
最后使用to_json
函数重新创建json对象
示例
:
Error in SQL statement: AnalysisException: Undefined function: 'JSON_MODIFY'.
This function is neither a registered temporary function nor a permanent function
registered in the database 'default'.
使用
from_json
函数将json展平到列中,然后更新col4
最后使用to_json
函数重新创建json对象
示例
:
Error in SQL statement: AnalysisException: Undefined function: 'JSON_MODIFY'.
This function is neither a registered temporary function nor a permanent function
registered in the database 'default'.
df.show(10,False)
#+-------------------------------------------------+
#|jsonData |
#+-------------------------------------------------+
#|{"col1":"AA","col2":"BB","col3":"CC","col4":"DD"}|
#+-------------------------------------------------+
from pyspark.sql.functions import *
df.selectExpr("from_json(jsonData,'col1 string,col2 string,col3 string,col4 string') as jsn_str").\
select("jsn_str.*").\
withColumn("col4",lit("TT")).\
withColumn("jsonData",to_json(struct(col("col1"),col("col2"),col("col3"),col("col4")))).\
select("jsonData").\
show(10,False)
#+-------------------------------------------------+
#|jsonData |
#+-------------------------------------------------+
#|{"col1":"AA","col2":"BB","col3":"CC","col4":"TT"}|
#+-------------------------------------------------+