Scala spark sql kafka与spark submit的依赖关系问题
我用scala编写了一个简单的驱动程序类,它使用spark sql kafka进行结构化流媒体传输。我使用eclipse+maven将其打包到一个jar中。Scala spark sql kafka与spark submit的依赖关系问题,scala,maven,apache-spark,Scala,Maven,Apache Spark,我用scala编写了一个简单的驱动程序类,它使用spark sql kafka进行结构化流媒体传输。我使用eclipse+maven将其打包到一个jar中。pom.xml文件的相关部分如下: <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId&g
pom.xml
文件的相关部分如下:
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.11</artifactId>
<version>1.5.0</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
</dependencies>
spark.executor.extraJavaOptions -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
spark.jars.ivySettings {path}/ivysettings_proxy.xml
<ivysettings>
<settings defaultResolver="default" />
<credentials host = "proxyName:8080" username = "" passwd = ""/>
<include url="${ivy.default.settings.dir}/ivysettings-public.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-shared.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-local.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-main-chain.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-default-chain.xml"/>
</ivysettings>
kafka流媒体配置
如下:
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.11</artifactId>
<version>1.5.0</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
</dependencies>
spark.executor.extraJavaOptions -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
spark.jars.ivySettings {path}/ivysettings_proxy.xml
<ivysettings>
<settings defaultResolver="default" />
<credentials host = "proxyName:8080" username = "" passwd = ""/>
<include url="${ivy.default.settings.dir}/ivysettings-public.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-shared.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-local.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-main-chain.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-default-chain.xml"/>
</ivysettings>
ivysettings\u proxy.xml
文件如下:
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.11</artifactId>
<version>1.5.0</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
</dependencies>
spark.executor.extraJavaOptions -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
spark.jars.ivySettings {path}/ivysettings_proxy.xml
<ivysettings>
<settings defaultResolver="default" />
<credentials host = "proxyName:8080" username = "" passwd = ""/>
<include url="${ivy.default.settings.dir}/ivysettings-public.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-shared.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-local.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-main-chain.xml" />
<include url="${ivy.default.settings.dir}/ivysettings-default-chain.xml"/>
</ivysettings>
当我使用上面的命令运行spark submit时,它尝试从maven存储库和其他URL下载,然后在连接超时时存在
如何使spark通过代理提交下载依赖项
谢谢。对我有用的是:
我已将spark提交属性文件更改为:
spark.driver.extraJavaOptions -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
spark.executor.extraJavaOptions -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
这导致了证书错误
然后我加了一张证书
到{path}/jdk1.8.0\u 144\jre\lib\security\cacerts
文件。(我使用了一个名为portecle的免费程序向cacerts文件添加证书。)
由于我在纱线模式下运行spark submit,因此我必须将新的cacerts文件复制到所有节点,包括:
pscp.pssh -h cluster-hosts ./cacerts {path}/jdk1.8.0_40/jre/lib/security/
对我起作用的是:
我已将spark提交属性文件更改为:
spark.driver.extraJavaOptions -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
spark.executor.extraJavaOptions -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
这导致了证书错误
然后我加了一张证书
到{path}/jdk1.8.0\u 144\jre\lib\security\cacerts
文件。(我使用了一个名为portecle的免费程序向cacerts文件添加证书。)
由于我在纱线模式下运行spark submit,因此我必须将新的cacerts文件复制到所有节点,包括:
pscp.pssh -h cluster-hosts ./cacerts {path}/jdk1.8.0_40/jre/lib/security/
检查已配置的代理连接和存储库,但暂时您可以手动下载该jar,并使用--jar选项为其提供spark submit。谢谢您的回答。添加罐子或制作一个胖罐子对我不起作用。我找到了一个解决方案,我总结如下。您提到的解决方案是与回购连接配置相关的,我怀疑并要求检查。除非您指定了--packages选项和--jar选项,否则添加jars选项肯定会起作用。请检查已配置的代理连接和存储库,但目前您可以手动下载该jar,并使用--jar选项为其提供spark submit。谢谢您的回答。添加罐子或制作一个胖罐子对我不起作用。我找到了一个解决方案,我总结如下。您提到的解决方案是与回购连接配置相关的,我怀疑并要求检查。除非指定--packages选项和--jar选项,否则添加jars选项肯定会起作用。