Postgresql spark使用hashMap将数据帧作为json写入postgres
我正在使用Postgresql spark使用hashMap将数据帧作为json写入postgres,postgresql,scala,apache-spark,Postgresql,Scala,Apache Spark,我正在使用2.2.1 我想写一个dataframe,其中有一个映射字段作为json字段进入postgres 示例代码: import java.util.Properties import org.apache.spark.SparkConf import org.apache.spark.sql.{SaveMode, SparkSession} import scala.collection.immutable.HashMap case class ExampleJson(map: Ha
2.2.1
我想写一个dataframe,其中有一个映射字段作为json字段进入postgres
示例代码:
import java.util.Properties
import org.apache.spark.SparkConf
import org.apache.spark.sql.{SaveMode, SparkSession}
import scala.collection.immutable.HashMap
case class ExampleJson(map: HashMap[String,Long])
object JdbcLoaderJson extends App{
val finalUrl = s"jdbc:postgresql://localhost:54321/development"
val user = "user"
val password = "123456"
val sparkConf = new SparkConf()
sparkConf.setMaster(s"local[2]")
val spark = SparkSession.builder().config(sparkConf).getOrCreate()
def writeWithJson(tableName: String) : Unit = {
def getProperties: Properties = {
val p = new Properties()
val prop = new java.util.Properties
prop.setProperty("user", user)
prop.setProperty("password", password)
prop
}
var schema = "public"
var table = tableName
val asList = List(ExampleJson(HashMap("x" -> 1L, "y" -> 2L)),
ExampleJson(HashMap("y" -> 3L, "z" -> 4L)))
val asDf = spark.createDataFrame(asList)
asDf.show(false)
asDf.write.mode(SaveMode.Overwrite).jdbc(finalUrl, tableName, getProperties)
}
writeWithJson("with_json")
}
输出:
+-------------------+
|map |
+-------------------+
|Map(x -> 1, y -> 2)|
|Map(y -> 3, z -> 4)|
+-------------------+
Exception in thread "main" java.lang.IllegalArgumentException: Can't get JDBC type for map<string,bigint>
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:172)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:172)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType(JdbcUtils.scala:171)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$schemaString$1$$anonfun$23.apply(JdbcUtils.scala:707)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$schemaString$1$$anonfun$23.apply(JdbcUtils.scala:707)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$schemaString$1.apply(JdbcUtils.scala:707)
Process finished with exit code 1
+-------------------+
|地图|
+-------------------+
|地图(x->1,y->2)|
|地图(y->3,z->4)|
+-------------------+
线程“main”java.lang.IllegalArgumentException中出现异常:无法获取映射的JDBC类型
位于org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:172)
位于org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:172)
位于scala.Option.getOrElse(Option.scala:121)
位于org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType(JdbcUtils.scala:171)
位于org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$schemaString$1$$anonfun$23.apply(JdbcUtils.scala:707)
位于org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$schemaString$1$$anonfun$23.apply(JdbcUtils.scala:707)
位于scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
位于scala.collection.AbstractMap.getOrElse(Map.scala:59)
位于org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$schemaString$1.apply(JdbcUtils.scala:707)
进程已完成,退出代码为1
实际上,我也可以使用字符串,而不是映射,这更多的是关于从spark将json列写入postgres,将HashMap数据转换为json字符串,如下所示
asDf
.select(
to_json(struct($"*"))
.as("map")
)
.write
.mode(SaveMode.Overwrite)
.jdbc(finalUrl, tableName, getProperties)
您可以发布您的目标表模式吗?所以通常spark会自动创建它,但任何包含jsonok列的表都可以添加一些示例最终输出到表中吗?