Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/perl/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pyspark将数据从databricks写入azure sql:ValueError:推断后无法确定某些类型_Pyspark_Azure Databricks_Apache Arrow - Fatal编程技术网

Pyspark将数据从databricks写入azure sql:ValueError:推断后无法确定某些类型

Pyspark将数据从databricks写入azure sql:ValueError:推断后无法确定某些类型,pyspark,azure-databricks,apache-arrow,Pyspark,Azure Databricks,Apache Arrow,我正在使用pyspark将数据从azure databricks写入azure sql。 没有空值的代码运行良好,但当dataframe包含空值时,我会出现以下错误: databricks/spark/python/pyspark/sql/pandas/conversion.py:300: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.ena

我正在使用pyspark将数据从azure databricks写入azure sql。 没有空值的代码运行良好,但当dataframe包含空值时,我会出现以下错误:

databricks/spark/python/pyspark/sql/pandas/conversion.py:300: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below:
  Unable to convert the field Product. If this column is not necessary, you may consider dropping it or converting to primitive type before the conversion.
Context: Unsupported type in conversion from Arrow: null
Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
  warnings.warn(msg)

ValueError: Some of types cannot be determined after inferring
数据帧必须写入sql,包括空值。我如何解决这个问题

sqlContext = SQLContext(sc)

def to_sql(df, table):
  finaldf = sqlContext.createDataFrame(df)
  finaldf.write.jdbc(url=url, table= table, mode ="overwrite", properties = properties)

 to_sql(data, f"TF_{table.upper()}")
编辑:

解决了这个问题,创建了一个将数据类型映射到sql数据类型并将列和数据类型作为一个字符串输出的函数

def convert_dtype(df):
    df_mssql = {'int64': 'bigint', 'object': 'varchar(200)', 'float64': 'float'}
    mydict = {}
    for col in df.columns:
        if str(df.dtypes[col]) in df_mssql:
            mydict[col] = df_mssql.get(str(df.dtypes[col]))
    l = " ".join([str(k[0] + " " + k[1] + ",") for k in list(mydict.items())])
    return l[:-1]
将此字符串传递到
createTableColumnTypes
选项解决了此问题

jdbcDF.write \
    .option("createTableColumnTypes", convert_dtype(df) \
    .jdbc("jdbc:postgresql:dbserver", "schema.tablename",
          properties={"user": "username", "password": "password"})

为此,需要在write语句中指定模式。以下是文档中的一个示例,链接如下:

jdbcDF.write \
    .option("createTableColumnTypes", "name CHAR(64), comments VARCHAR(1024)") \
    .jdbc("jdbc:postgresql:dbserver", "schema.tablename",
          properties={"user": "username", "password": "password"})

嗨,谢谢你的回答。我编写了一个小函数,将pandas数据类型映射到一个包含列和sql数据类型的字符串。将在我的帖子中编辑此内容。