使用Apache Spark 1.4.0写入Oracle数据库

使用Apache Spark 1.4.0写入Oracle数据库,oracle,scala,jdbc,apache-spark,Oracle,Scala,Jdbc,Apache Spark,我正在尝试使用Spark 1.4.0DataFrame.write.jdbc()函数将一些数据写入Oracle数据库 对称的read.jdbc()函数用于将数据从Oracle数据库读取到DataFrame对象,效果良好。然而,当我写回数据帧时(我也尝试写了与我从数据库设置CverWrite得到的对象完全相同的对象),会出现以下异常: Exception in thread "main" java.sql.SQLSyntaxErrorException: ORA-00902: Ungültiger

我正在尝试使用Spark 1.4.0DataFrame.write.jdbc()函数将一些数据写入Oracle数据库

对称的read.jdbc()函数用于将数据从Oracle数据库读取到DataFrame对象,效果良好。然而,当我写回数据帧时(我也尝试写了与我从数据库设置CverWrite得到的对象完全相同的对象),会出现以下异常:

Exception in thread "main" java.sql.SQLSyntaxErrorException: ORA-00902: Ungültiger Datentyp

    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399)
    at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1017)
    at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:655)
    at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:249)
    at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:566)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:215)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:58)
    at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:943)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1075)
    at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3820)
    at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3897)
    at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:1361)
    at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:252)
    at main3$.main(main3.scala:72)
    at main3.main(main3.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
该表有2个基本字符串列。当它们是整数时,它也可以写入它

实际上,当我深入研究时,我意识到它将StringType映射到Oracle无法识别的“文本”(应该是“VARCHAR”)。代码来自jdbc.scala,可在以下位置找到:


实际答案-不可能使用1.4.0中现有的DataFrame.write.jdbc()实现回写到Oracle,但是如果您不介意升级到Spark 1.5,有一种稍微有点黑客味的方法。 如上所述,存在两个问题:

检查表是否存在的简单方法与oracle不兼容

SELECT 1 FROM $table LIMIT 1
这可以通过直接保存表实用程序方法轻松避免

org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(df, url, table, props)
还有一个难题(正如您正确猜测的那样)——没有现成的Oracle特定数据类型方言。采用同一条解决方案:

import org.apache.spark.sql.jdbc.{JdbcDialects, JdbcType, JdbcDialect}
import org.apache.spark.sql.types._

  val OracleDialect = new JdbcDialect {
    override def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle") || url.contains("oracle")

    override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
      case StringType => Some(JdbcType("VARCHAR2(255)", java.sql.Types.VARCHAR))
      case BooleanType => Some(JdbcType("NUMBER(1)", java.sql.Types.NUMERIC))
      case IntegerType => Some(JdbcType("NUMBER(10)", java.sql.Types.NUMERIC))
      case LongType => Some(JdbcType("NUMBER(19)", java.sql.Types.NUMERIC))
      case DoubleType => Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))
      case FloatType => Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))
      case ShortType => Some(JdbcType("NUMBER(5)", java.sql.Types.NUMERIC))
      case ByteType => Some(JdbcType("NUMBER(3)", java.sql.Types.NUMERIC))
      case BinaryType => Some(JdbcType("BLOB", java.sql.Types.BLOB))
      case TimestampType => Some(JdbcType("DATE", java.sql.Types.DATE))
      case DateType => Some(JdbcType("DATE", java.sql.Types.DATE))
//      case DecimalType.Fixed(precision, scale) => Some(JdbcType("NUMBER(" + precision + "," + scale + ")", java.sql.Types.NUMERIC))
      case DecimalType.Unlimited => Some(JdbcType("NUMBER(38,4)", java.sql.Types.NUMERIC))
      case _ => None
    }
  }

    JdbcDialects.registerDialect(OracleDialect)
所以,最后,工作示例应该类似于

  val url: String = "jdbc:oracle:thin:@your_domain:1521/dbname"
  val driver: String = "oracle.jdbc.OracleDriver"
  val props = new java.util.Properties()
  props.setProperty("user", "username")
  props.setProperty("password", "userpassword")
  org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(dataFrame, url, "table_name", props)

您可以使用
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable
。正如Aerondir所说。

更新:从Spark 2.x开始

存在这样一个问题:在创建jdbc表时,Spark中的每个列名都被双引号引用,因此,当您尝试通过sqlPlus查询Oracle表的所有列名时,它们都会区分大小写

select colA from myTable; => doesn't works anymore
select "colA" from myTable; =>  works
[解决方案]

不幸的是,此API是Spark的内部API。所以代码可以在以后的版本中被破坏。情况是这样的:Spark 2.1中的第四个参数已从Property更改为JDBCOptions。请看
  val url: String = "jdbc:oracle:thin:@your_domain:1521/dbname"
  val driver: String = "oracle.jdbc.OracleDriver"
  val props = new java.util.Properties()
  props.setProperty("user", "username")
  props.setProperty("password", "userpassword")
  org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(dataFrame, url, "table_name", props)
select colA from myTable; => doesn't works anymore
select "colA" from myTable; =>  works