Apache spark jar在HDFS中时Spark作业未运行

Apache spark jar在HDFS中时Spark作业未运行,apache-spark,hdfs,spark-submit,Apache Spark,Hdfs,Spark Submit,我试图在独立模式下运行spark作业,但该命令没有从HDFS中提取jar。jar位于HDFS位置,在本地模式下运行时工作正常 下面是我正在使用的命令 spark-submit --deploy-mode client --master yarn --class com.main.WordCount /spark/wc.jar 以下是我的节目: val conf = new SparkConf().setAppName("WordCount").setMaster("yarn")

我试图在独立模式下运行spark作业,但该命令没有从HDFS中提取jar。jar位于HDFS位置,在本地模式下运行时工作正常

下面是我正在使用的命令

spark-submit --deploy-mode client --master yarn --class com.main.WordCount /spark/wc.jar
以下是我的节目:

    val conf = new SparkConf().setAppName("WordCount").setMaster("yarn")
    val spark = new SparkContext(conf)
    val file  = spark.textFile(args(0))

    val count = file.flatMap(f=>f.split(" ")).map(word=>(word,1)).reduceByKey(_+_).collect
    count.foreach(println)
我得到了以下错误:

Warning: Local jar /spark/wc.jar does not exist, skipping.
java.lang.ClassNotFoundException: com.main.WordCount
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:228)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.io.FileNotFoundException: File file:/spark/wc.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
    at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:340)
    at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:433)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:530)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:529)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:529)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1119)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1178)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
但如果使用部署模式群集,则会出现以下错误:

Warning: Local jar /spark/wc.jar does not exist, skipping.
java.lang.ClassNotFoundException: com.main.WordCount
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:228)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.io.FileNotFoundException: File file:/spark/wc.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
    at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:340)
    at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:433)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:530)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$10.apply(Client.scala:529)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:529)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1119)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1178)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

你能澄清一下什么是本地模式吗。只有两种部署模式客户端和集群,唯一的区别在于客户端模式驱动程序将在系统上运行,而集群模式驱动程序将从集群中的随机节点运行

对于spark提交命令:


当您执行spark submit命令时,spark将把使用--files、-py files参数以及spark Main Jar定义的所有本地资源/文件拉入临时HDFS位置/目录,该位置/目录由特定spark应用程序使用应用程序名创建。当您给出HDFS位置时,它将无法在本地计算机上定位Jar。必须将Jar保持在本地。

查看语句和问题,我不相信您正在尝试以独立模式运行。请澄清。如果您认为此解释正确,请将其标记为正确。这是我在过去两天中查看的解释,非常感谢,我的程序正在运行。@KumarHarsh更新您的问题,因为它的格式不正确。请包括正确的命令作为回答的一部分,为其他人,而不是我