Apache spark 如何在配置单元表中插入具有映射列的数据帧_Apache Spark_Hadoop_Hive_Apache Spark Sql_Complextype

Apache spark 如何在配置单元表中插入具有映射列的数据帧

apache-spark hadoop hive

Apache spark 如何在配置单元表中插入具有映射列的数据帧,apache-spark,hadoop,hive,apache-spark-sql,complextype,Apache Spark,Hadoop,Hive,Apache Spark Sql,Complextype,我有一个具有多个列的数据框架，其中一列是map（string，string）类型。我可以打印这个数据框，其中列作为map，数据作为map（“PUN”->“Pune”）。我想将此数据帧写入配置单元表（存储为avro），该表具有与类型map相同的列 Df.withcolumn("cname", lit("Pune")) withcolumn("city_code_name", map(lit("PUN"), col("cname")) Df.show(false) //table - create

我有一个具有多个列的数据框架，其中一列是map（string，string）类型。我可以打印这个数据框，其中列作为map，数据作为map（“PUN”->“Pune”）。我想将此数据帧写入配置单元表（存储为avro），该表具有与类型map相同的列

Df.withcolumn("cname", lit("Pune"))
withcolumn("city_code_name", map(lit("PUN"), col("cname"))
Df.show(false)

//table - created external hive table..stored as avro..with avro schema

删除此映射类型列后，我能够将数据帧保存到hive avro表

保存到配置单元表的方式：

spark.save-保存avro文件

spark.sql-使用avro文件位置在配置单元表上创建分区

你可以通过例如：

将模式选项更改为适合您的选项

yourdataframewithmapcolumn.write.partitionBy是创建分区的方法。

我想避免创建临时表，并保持与上述方法相同的保存方式，只需使用新的映射类型列更新数据框，并将其直接保存到存储为avro的配置单元表中。`spark.createDataFrame（df.rdd，st）.write.format(“com.databricks.spark.avro”）.mode（SaveMode.Overwrite）.save（path）`-使用此选项将avro文件保存在给定路径

sparkSession.sql（s“alter table tablename ADD partition（p1='210'）位置'path'）

-用于创建分区注释-这在没有映射类型列的情况下工作正常，并且在运行此代码之前单独创建了配置单元表

错误-原因：java.lang.NullPointerException:in topLevelRecord in union in map in

    Df\
        .write\
        .saveAsTable(name='tableName',
                     format='com.databricks.spark.avro',
                     mode='append',
                     path='avroFileLocation')

  test("Insert MapType.valueContainsNull == false") {
    val schema = StructType(Seq(
      StructField("m", MapType(StringType, StringType, valueContainsNull = false))))
    val rowRDD = spark.sparkContext.parallelize(
      (1 to 100).map(i => Row(Map(s"key$i" -> s"value$i"))))
    val df = spark.createDataFrame(rowRDD, schema)
    df.createOrReplaceTempView("tableWithMapValue")
    sql("CREATE TABLE hiveTableWithMapValue(m Map <STRING, STRING>)")
    sql("INSERT OVERWRITE TABLE hiveTableWithMapValue SELECT m FROM tableWithMapValue")

    checkAnswer(
      sql("SELECT * FROM hiveTableWithMapValue"),
      rowRDD.collect().toSeq)

    sql("DROP TABLE hiveTableWithMapValue")
  }

Seq(9 -> "x").toDF("i", "j")
        .write.format("hive").mode(SaveMode.Overwrite).option("fileFormat", "avro").saveAsTable("t")