Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Postgresql 如何使用Spark删除数据库中的行?_Postgresql_Apache Spark_Pyspark - Fatal编程技术网

Postgresql 如何使用Spark删除数据库中的行?

Postgresql 如何使用Spark删除数据库中的行?,postgresql,apache-spark,pyspark,Postgresql,Apache Spark,Pyspark,谢谢你阅读这个问题 我知道如何插入行 df.write \ .format('jdbc') \ .option("url", url) \ .option("dbtable", table) \ .option("user", user) \ .option("password", password) \ .option("driver", "org.postgresql.Driver") \

谢谢你阅读这个问题

我知道如何插入行

    df.write \
        .format('jdbc') \
        .option("url", url) \
        .option("dbtable", table) \
        .option("user", user) \
        .option("password", password) \
        .option("driver", "org.postgresql.Driver") \
        .mode('append') \
        .save()
但是如何删除行呢? 像

可能吗?

Spark不支持它。 但是我已经用foreachPartition完成了(只使用数据帧数据..)

像这样


使用
PreparedStatement
ad
executeUpdate()
进行删除时使用本机JDBC连接。您不能这样做。要做到这一点,您需要回到旧的JDBC方法。您必须迭代要删除的行,并以批处理方式将其删除。
df = [Row(id=1), Row(id=2), ... ]

=> DELETE FROM TABLE WHERE id in df ...
df.rdd.coalesce(2).foreachPartition(partition => {
  val connectionProperties = brConnect.value
  val jdbcUrl = connectionProperties.getProperty("jdbcurl")
  val user = connectionProperties.getProperty("user")
  val password = connectionProperties.getProperty("password")
  val driver = connectionProperties.getProperty("Driver")
  Class.forName(driver)
  val dbc: Connection = DriverManager.getConnection(jdbcUrl, user, password)
  val db_batchsize = 1000
  val sqlString = "INSERT employee USING values (?, ?, ?, ?)"

  var pstmt: PreparedStatement = dbc.prepareStatement(sqlString)
  partition.grouped(db_batchsize).foreach(batch => {
    batch.foreach{ row =>
      {
        val id = row.id
        val fname = row.fname
        val lname = row.lname
        val userid = row.userid

        var pstmt: PreparedStatement = 
        pstmt.setLong(1, row.id)
        pstmt.setString(2, row.fname)
        pstmt.setString(3, row.lname)
        pstmt.setString(4, row.userid)
        pstmt.addBatch()
      }
    }
    pstmt.executeBatch()
    dbc.commit()
  })
  dbc.close()
})