PySpark sql将具有json列的数据帧写入mysql JDBC

PySpark sql将具有json列的数据帧写入mysql JDBC,pyspark,pyspark-sql,databricks,pyspark-dataframes,Pyspark,Pyspark Sql,Databricks,Pyspark Dataframes,我正在尝试向mysql JDBC数据库批量写入数据帧。我正在使用databricks/pyspark.sql将数据帧写入表中。此表有一列接受json数据(二进制数据)。我确实将json对象转换为具有以下结构的StructType: json对象结构和到数据帧的转换: schema_dict = {'fields': [ {'metadata': {}, 'name': 'dict', 'nullable': True, 'type': {"containsNull": True, "el

我正在尝试向mysql JDBC数据库批量写入数据帧。我正在使用databricks/pyspark.sql将数据帧写入表中。此表有一列接受json数据(二进制数据)。我确实将json对象转换为具有以下结构的StructType:

json对象结构和到数据帧的转换:

schema_dict = {'fields': [
    {'metadata': {}, 'name': 'dict', 'nullable': True, 'type': {"containsNull": True, "elementType":{'fields': [
      {'metadata': {}, 'name': 'y1', 'nullable': True, 'type': 'integer'},
      {'metadata': {}, 'name': 'y2', 'nullable': True, 'type': 'integer'}
    ],"type": 'struct'}, "type": 'array'}}
], 'type': 'struct'}

cSchema = StructType([StructField("x1", IntegerType()),StructField("x2", IntegerType()),StructField("x3", IntegerType()),StructField("x4", TimestampType()), StructField("x5", IntegerType()), StructField("x6", IntegerType()),
                     StructField("x7", IntegerType()), StructField("x8", TimestampType()), StructField("x9", IntegerType()), StructField("x10", StructType.fromJson(schema_dict))])
df = spark.createDataFrame(parsedList,schema=cSchema)
df:pyspark.sql.dataframe.DataFrame
x1:integer
x2:integer
x3:integer
x4:timestamp
x5:integer
x6:integer
x7:integer
x8:timestamp
x9:integer
x10:struct
    dict:array
        element:struct
              y1:integer
              y2:integer
输出数据帧:

schema_dict = {'fields': [
    {'metadata': {}, 'name': 'dict', 'nullable': True, 'type': {"containsNull": True, "elementType":{'fields': [
      {'metadata': {}, 'name': 'y1', 'nullable': True, 'type': 'integer'},
      {'metadata': {}, 'name': 'y2', 'nullable': True, 'type': 'integer'}
    ],"type": 'struct'}, "type": 'array'}}
], 'type': 'struct'}

cSchema = StructType([StructField("x1", IntegerType()),StructField("x2", IntegerType()),StructField("x3", IntegerType()),StructField("x4", TimestampType()), StructField("x5", IntegerType()), StructField("x6", IntegerType()),
                     StructField("x7", IntegerType()), StructField("x8", TimestampType()), StructField("x9", IntegerType()), StructField("x10", StructType.fromJson(schema_dict))])
df = spark.createDataFrame(parsedList,schema=cSchema)
df:pyspark.sql.dataframe.DataFrame
x1:integer
x2:integer
x3:integer
x4:timestamp
x5:integer
x6:integer
x7:integer
x8:timestamp
x9:integer
x10:struct
    dict:array
        element:struct
              y1:integer
              y2:integer
现在我尝试使用mysql表将这个数据帧写入mysql表

import urllib 
from pyspark.sql import SQLContext
from pyspark.sql.functions import regexp_replace, col
sqlContext = SQLContext(sc)
sqlContext

driver = "org.mariadb.jdbc.Driver"
url = "jdbc:mysql://dburl?rewriteBatchedStatements=true"
trial = "dbname.tablename"
user = "dbuser"
password = "dbpassword"
properties = {
    "user": user,
    "password": password,
    "driver": driver
}
df.write.jdbc(url=url, table=trial, mode="append", properties = properties)
我得到这个错误:

An error occurred while calling o2118.jdbc.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 176.0 failed 4 times, most recent failure: Lost task 15.3 in stage 176.0 (TID 9528, 10.168.231.82, executor 5): java.lang.IllegalArgumentException: Can't get JDBC type for struct<dict:array<struct<y1:int,y2:int>>>
调用o2118.jdbc时出错。
:org.apache.spark.SparkException:作业因阶段失败而中止:阶段176.0中的任务15失败4次,最近的失败:阶段176.0中的任务15.3丢失(TID 9528,10.168.231.82,executor 5):java.lang.IllegalArgumentException:无法获取结构的JDBC类型
关于如何将具有json列的数据帧写入mysql表,有什么想法吗?或者如何解决这个问题


我正在使用Databricks 5.5 LTS(包括Apache Spark 2.4.3、Scala 2.11)

MYSQL中已经定义了表吗?如果是,那么JSON的模式是什么,或者具体地说是列类型。当JDBC驱动程序不知道如何在中存储对象时,就会发生此问题MYSQL@NehaJirafe是的,它已经在mysql中定义了。JSON列类型是JSON。而且,我不能忽视它,因为我不能考虑JSON、BLB等二进制列的默认值。