Apache spark 如何在spark-sql-2.4.1v中使用ojdbc14.jar将日期/时间戳作为下限/上限传递?
我使用spark-sql-2.4.1v和ojdbc6.jar从oracle读取数据 有如下oracle表Apache spark 如何在spark-sql-2.4.1v中使用ojdbc14.jar将日期/时间戳作为下限/上限传递?,apache-spark,oracle11g,apache-spark-sql,oracle10g,oracle11gr2,Apache Spark,Oracle11g,Apache Spark Sql,Oracle10g,Oracle11gr2,我使用spark-sql-2.4.1v和ojdbc6.jar从oracle读取数据 有如下oracle表 create table schema1.modal_vals( FAMILY_ID NOT NULL NUMBER, INSERTION_DATE NOT NULL DATE, ITEM_VALUE VARCHAR2(4000), YEAR NUMBER, QUARTER NUM
create table schema1.modal_vals(
FAMILY_ID NOT NULL NUMBER,
INSERTION_DATE NOT NULL DATE,
ITEM_VALUE VARCHAR2(4000),
YEAR NUMBER,
QUARTER NUMBER,
LAST_UPDATE_DATE DATE
)
加载样本数据:
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-02","bbb-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","b+",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-17","bbb-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","bb",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-02","ccc-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","aa-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-OCT-13","a-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-03","bbb-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","b",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-FEB-03","aa+",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-13","aa+",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JAN-19","aaa+",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"30-JUN-18","ccc-",2013,2,null);
insert into modal_vals(FAMILY_ID,INSERTION_DATE,ITEM_VALUE,YEAR,QUARTER,LAST_UPDATE_DATE) values(2,"01-MAY-19","bb-",2013,2,null);
尝试将数据加载到spark sql中,如下所示:
//please fill the respected oracle details
DataFrameReader ora_df_reader = spark.read().format("jdbc")
.option("url", o_url)
.option("driver", Constants.ORACLE_DRIVER)
.option("user", o_userName)
.option("password", o_passwd)
.option("fetchsize",1000);
Dataset<Row> ss = ora_df_reader
.option("inferSchema", true)
.option("schema","schema1")
.option("numPartitions", 20);
.option("partitionColumn", "INSERTION_DATE");
.option("lowerBound", "2002-03-31" )
.option("upperBound", "2019-05-01")
.option("dateFormat", "yyyy-MM-dd" )// Tried all "yyyy-mm-dd" ,"yyyy-MM-dd" "YYYY-MM-DD" "DD-MMM-YY" "dd-MMM-yy"
.option("dbtable", "select * from schema1.modal_vals")
.load();
但给出了错误:
java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:204)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:179)
尝试第二种方式:
.option("lowerBound", "2002-03-31 00:00:00");
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");
.option("numPartitions", 240);
.option("lowerBound","2002-03-31");
.option("upperBound", "2019-05-01");
.option("dateFormat", "yyyy-mm-dd");
java.sql.SQLException: ORA-12801: error signaled in parallel query server P001(2)
ORA-01861: literal does not match format string
ORA-02063: preceding 2 lines from CAPDBPROD
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:884)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3584)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3628)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1493)
.option("lowerBound","03/31/2002 00:00:00");
.option("upperBound", "05/01/2019 23:59:59");
.option("dateFormat", "mm/dd/yyyy hh:mm:ss");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
.option("lowerBound","03/31/2002");
.option("upperBound", "05/01/2019");
.option("dateFormat", "mm/dd/yyyy");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
option("lowerBound", "31.03.2002 00:00:00" );
option("upperBound", "01.05.2019 23:59:59");
option("dateFormat", "DD.MM.YYYY HH24:MI:SS")
我得到了一个错误:
ORA-01861:文字与格式字符串不匹配
如何传递“下限/上限”的日期
错误:
.option("lowerBound", "2002-03-31 00:00:00");
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");
.option("numPartitions", 240);
.option("lowerBound","2002-03-31");
.option("upperBound", "2019-05-01");
.option("dateFormat", "yyyy-mm-dd");
java.sql.SQLException: ORA-12801: error signaled in parallel query server P001(2)
ORA-01861: literal does not match format string
ORA-02063: preceding 2 lines from CAPDBPROD
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:884)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3584)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3628)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1493)
.option("lowerBound","03/31/2002 00:00:00");
.option("upperBound", "05/01/2019 23:59:59");
.option("dateFormat", "mm/dd/yyyy hh:mm:ss");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
.option("lowerBound","03/31/2002");
.option("upperBound", "05/01/2019");
.option("dateFormat", "mm/dd/yyyy");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
option("lowerBound", "31.03.2002 00:00:00" );
option("upperBound", "01.05.2019 23:59:59");
option("dateFormat", "DD.MM.YYYY HH24:MI:SS")
根据此修复程序,我们可以使用日期/时间戳作为分区列:
尝试第二种方式:
.option("lowerBound", "2002-03-31 00:00:00");
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");
.option("numPartitions", 240);
.option("lowerBound","2002-03-31");
.option("upperBound", "2019-05-01");
.option("dateFormat", "yyyy-mm-dd");
java.sql.SQLException: ORA-12801: error signaled in parallel query server P001(2)
ORA-01861: literal does not match format string
ORA-02063: preceding 2 lines from CAPDBPROD
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:884)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3584)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3628)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1493)
.option("lowerBound","03/31/2002 00:00:00");
.option("upperBound", "05/01/2019 23:59:59");
.option("dateFormat", "mm/dd/yyyy hh:mm:ss");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
.option("lowerBound","03/31/2002");
.option("upperBound", "05/01/2019");
.option("dateFormat", "mm/dd/yyyy");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
option("lowerBound", "31.03.2002 00:00:00" );
option("upperBound", "01.05.2019 23:59:59");
option("dateFormat", "DD.MM.YYYY HH24:MI:SS")
尝试第三种方式:
.option("lowerBound", "2002-03-31 00:00:00");
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");
.option("numPartitions", 240);
.option("lowerBound","2002-03-31");
.option("upperBound", "2019-05-01");
.option("dateFormat", "yyyy-mm-dd");
java.sql.SQLException: ORA-12801: error signaled in parallel query server P001(2)
ORA-01861: literal does not match format string
ORA-02063: preceding 2 lines from CAPDBPROD
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:884)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3584)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3628)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1493)
.option("lowerBound","03/31/2002 00:00:00");
.option("upperBound", "05/01/2019 23:59:59");
.option("dateFormat", "mm/dd/yyyy hh:mm:ss");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
.option("lowerBound","03/31/2002");
.option("upperBound", "05/01/2019");
.option("dateFormat", "mm/dd/yyyy");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
option("lowerBound", "31.03.2002 00:00:00" );
option("upperBound", "01.05.2019 23:59:59");
option("dateFormat", "DD.MM.YYYY HH24:MI:SS")
尝试第四种方式:
.option("lowerBound", "2002-03-31 00:00:00");
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");
.option("numPartitions", 240);
.option("lowerBound","2002-03-31");
.option("upperBound", "2019-05-01");
.option("dateFormat", "yyyy-mm-dd");
java.sql.SQLException: ORA-12801: error signaled in parallel query server P001(2)
ORA-01861: literal does not match format string
ORA-02063: preceding 2 lines from CAPDBPROD
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:884)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3584)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3628)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1493)
.option("lowerBound","03/31/2002 00:00:00");
.option("upperBound", "05/01/2019 23:59:59");
.option("dateFormat", "mm/dd/yyyy hh:mm:ss");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
.option("lowerBound","03/31/2002");
.option("upperBound", "05/01/2019");
.option("dateFormat", "mm/dd/yyyy");
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:178)
option("lowerBound", "31.03.2002 00:00:00" );
option("upperBound", "01.05.2019 23:59:59");
option("dateFormat", "DD.MM.YYYY HH24:MI:SS")
java.lang.IllegalArgumentException
位于java.sql.Date.valueOf(Date.java:143)
在org.apache.spark.sql.execution.datasources.jdbc.jdbcorrelation$.toInternalBoundValue(jdbcorrelation.scala:178)如果我们在下面添加它,它将解决问题
option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'");
仅供参考,您不能将@用于对问题/答案(如评论、编辑、结束等)不活跃的人。此外,请尝试将dateFormat更改为
“yyyy-MM-dd”
(来自github pull请求:)。@Shaido,与.option(“dateFormat”、“yyy-MM-dd”)相同的错误;你有日期或时间戳吗?2016年03月16日00:00:00和格式mm/dd/yyyy hh:mm:ss
会发生什么情况?@AlexandrosBiratsis,如问题所述,尝试了第二种方式和第三种方式,但没有运气