Scala spark sql kafka与spark submit的依赖关系问题

Scala spark sql kafka与spark submit的依赖关系问题,scala,maven,apache-spark,Scala,Maven,Apache Spark,我用scala编写了一个简单的驱动程序类,它使用spark sql kafka进行结构化流媒体传输。我使用eclipse+maven将其打包到一个jar中。pom.xml文件的相关部分如下: <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId&g

我用scala编写了一个简单的驱动程序类,它使用spark sql kafka进行结构化流媒体传输。我使用eclipse+maven将其打包到一个jar中。
pom.xml
文件的相关部分如下:

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.8</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.11</artifactId>
        <version>1.5.0</version>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
        <version>2.0.2</version>
        <scope>provided</scope>
    </dependency>
</dependencies>
spark.executor.extraJavaOptions    -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080

spark.jars.ivySettings {path}/ivysettings_proxy.xml
<ivysettings>
   <settings defaultResolver="default" /> 
   <credentials host = "proxyName:8080" username = "" passwd = ""/>
   <include url="${ivy.default.settings.dir}/ivysettings-public.xml" /> 
    <include url="${ivy.default.settings.dir}/ivysettings-shared.xml" /> 
   <include url="${ivy.default.settings.dir}/ivysettings-local.xml" /> 
   <include url="${ivy.default.settings.dir}/ivysettings-main-chain.xml" /> 
   <include url="${ivy.default.settings.dir}/ivysettings-default-chain.xml"/> 
  </ivysettings>
kafka流媒体配置
如下:

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.8</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.11</artifactId>
        <version>1.5.0</version>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
        <version>2.0.2</version>
        <scope>provided</scope>
    </dependency>
</dependencies>
spark.executor.extraJavaOptions    -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080

spark.jars.ivySettings {path}/ivysettings_proxy.xml
<ivysettings>
   <settings defaultResolver="default" /> 
   <credentials host = "proxyName:8080" username = "" passwd = ""/>
   <include url="${ivy.default.settings.dir}/ivysettings-public.xml" /> 
    <include url="${ivy.default.settings.dir}/ivysettings-shared.xml" /> 
   <include url="${ivy.default.settings.dir}/ivysettings-local.xml" /> 
   <include url="${ivy.default.settings.dir}/ivysettings-main-chain.xml" /> 
   <include url="${ivy.default.settings.dir}/ivysettings-default-chain.xml"/> 
  </ivysettings>
ivysettings\u proxy.xml
文件如下:

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.8</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.11</artifactId>
        <version>1.5.0</version>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
        <version>2.0.2</version>
        <scope>provided</scope>
    </dependency>
</dependencies>
spark.executor.extraJavaOptions    -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080

spark.jars.ivySettings {path}/ivysettings_proxy.xml
<ivysettings>
   <settings defaultResolver="default" /> 
   <credentials host = "proxyName:8080" username = "" passwd = ""/>
   <include url="${ivy.default.settings.dir}/ivysettings-public.xml" /> 
    <include url="${ivy.default.settings.dir}/ivysettings-shared.xml" /> 
   <include url="${ivy.default.settings.dir}/ivysettings-local.xml" /> 
   <include url="${ivy.default.settings.dir}/ivysettings-main-chain.xml" /> 
   <include url="${ivy.default.settings.dir}/ivysettings-default-chain.xml"/> 
  </ivysettings>
当我使用上面的命令运行spark submit时,它尝试从maven存储库和其他URL下载,然后在连接超时时存在

如何使spark通过代理提交下载依赖项

谢谢。

对我有用的是:

我已将spark提交属性文件更改为:

spark.driver.extraJavaOptions  -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
spark.executor.extraJavaOptions    -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
这导致了证书错误

然后我加了一张证书

{path}/jdk1.8.0\u 144\jre\lib\security\cacerts
文件。(我使用了一个名为portecle的免费程序向cacerts文件添加证书。)

由于我在纱线模式下运行spark submit,因此我必须将新的cacerts文件复制到所有节点,包括:

pscp.pssh -h cluster-hosts ./cacerts  {path}/jdk1.8.0_40/jre/lib/security/ 
对我起作用的是:

我已将spark提交属性文件更改为:

spark.driver.extraJavaOptions  -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
spark.executor.extraJavaOptions    -Dhttp.proxyHost=proxyName -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxyName -Dhttps.proxyPort=8080
这导致了证书错误

然后我加了一张证书

{path}/jdk1.8.0\u 144\jre\lib\security\cacerts
文件。(我使用了一个名为portecle的免费程序向cacerts文件添加证书。)

由于我在纱线模式下运行spark submit,因此我必须将新的cacerts文件复制到所有节点,包括:

pscp.pssh -h cluster-hosts ./cacerts  {path}/jdk1.8.0_40/jre/lib/security/ 

检查已配置的代理连接和存储库,但暂时您可以手动下载该jar,并使用--jar选项为其提供spark submit。谢谢您的回答。添加罐子或制作一个胖罐子对我不起作用。我找到了一个解决方案,我总结如下。您提到的解决方案是与回购连接配置相关的,我怀疑并要求检查。除非您指定了--packages选项和--jar选项,否则添加jars选项肯定会起作用。请检查已配置的代理连接和存储库,但目前您可以手动下载该jar,并使用--jar选项为其提供spark submit。谢谢您的回答。添加罐子或制作一个胖罐子对我不起作用。我找到了一个解决方案,我总结如下。您提到的解决方案是与回购连接配置相关的,我怀疑并要求检查。除非指定--packages选项和--jar选项,否则添加jars选项肯定会起作用。