Apache spark 使用spark从Hive到Netezza数据导出的可行性

Apache spark 使用spark从Hive到Netezza数据导出的可行性,apache-spark,spark-dataframe,netezza,apache-spark-dataset,Apache Spark,Spark Dataframe,Netezza,Apache Spark Dataset,这封邮件将讨论我的团队正在处理的一个用例。 它是将元数据和数据从配置单元服务器导出到RDBMS 在这样做的时候,导出到MySQL和ORACLE工作得很好,但是导出到 Netezza失败,出现错误消息: 17/02/09 16:03:07 INFO DAGScheduler: Job 1 finished: json at RdbmsSandboxExecution.java:80, took 0.433405 s 17/02/09 16:03:07 INFO TaskSetManager: Fi

这封邮件将讨论我的团队正在处理的一个用例。 它是将元数据和数据从配置单元服务器导出到RDBMS

在这样做的时候,导出到MySQL和ORACLE工作得很好,但是导出到

Netezza失败,出现错误消息:

17/02/09 16:03:07 INFO DAGScheduler: Job 1 finished: json at RdbmsSandboxExecution.java:80, took 0.433405 s
17/02/09 16:03:07 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 143 ms on localhost (1/1)
17/02/09 16:03:07 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
Exception in thread "main" java.sql.SQLException: No suitable driver
        at java.sql.DriverManager.getDriver(DriverManager.java:278)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:50)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:50)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:49)
        at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:278)
        at org.apache.spark.sql.DataFrame.createJDBCTable(DataFrame.scala:1767)
        at com.zaloni.mica.datatype.conversion.RdbmsSandboxExecution.main(RdbmsSandboxExecution.java:81)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/09 16:03:07 INFO SparkContext: Invoking stop() from shutdown hook
17/02/09 16:03:07 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static/sql,null}
17/02/09 16:03:07 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL1/execution/json,null}
我们正在使用DataFrame.createJDBCTable

我们正在使用的spark submit命令是:

spark-submit --class <java_class_with_export_logic> --master local --deploy-mode client --conf spark.driver.extraClassPath=/absolute-path/nzjdbc3.jar --jars /absolute-path/nzjdbc3.jar /absolute-path/<application-jar <JDBC_URL>

spark submit--class--master local--deploy mode client--conf spark.driver.extraClassPath=/absolute path/nzjdbc3.jar--jars/absolute path/nzjdbc3.jar/absolute path/spark sumit命令应该类似于:spark submit--master local[*]--部署模式客户端--conf spark.driver.extraClassPath=/absolute path/nzjdbc3.jar--jars/absolute path/nzjdbc3.jar--class“local[*]”和“local”有什么区别?“local”在MySQL和Oracle上运行良好。问题不在于local[*],而在于spark submit中的选项顺序。我很久以前就遇到过同样的问题,最后移动--class选项解决了我的问题,因此建议将--class选项推到最后,抛出这个错误=>错误:JAR中没有设置主类;请使用--class Run with--help指定用法帮助,或使用--verbose指定调试输出哪个spark版本?我使用的是1.6.1