Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark spark.Thread.jars酒店-如何处理?_Apache Spark - Fatal编程技术网

Apache spark spark.Thread.jars酒店-如何处理?

Apache spark spark.Thread.jars酒店-如何处理?,apache-spark,Apache Spark,我对Spark的了解有限,读了这个问题后你会感觉到。我只有一个节点,上面安装了spark、hadoop和Thread 我能够通过下面的命令在集群模式下编码和运行单词计数问题 spark-submit --class com.sanjeevd.sparksimple.wordcount.JobRunner --master yarn --deploy-mode cluster --driver-memory=

我对Spark的了解有限,读了这个问题后你会感觉到。我只有一个节点,上面安装了spark、hadoop和Thread

我能够通过下面的命令在集群模式下编码和运行单词计数问题

 spark-submit --class com.sanjeevd.sparksimple.wordcount.JobRunner 
              --master yarn 
              --deploy-mode cluster
              --driver-memory=2g
              --executor-memory 2g
              --executor-cores 1
              --num-executors 1
              SparkSimple-0.0.1SNAPSHOT.jar                                 
              hdfs://sanjeevd.br:9000/user/spark-test/word-count/input
              hdfs://sanjeevd.br:9000/user/spark-test/word-count/output
它很好用

现在我明白了“spark on Thread”需要集群上可用的spark jar文件,如果我不做任何事情,那么每次运行我的程序时,它都会将数百个jar文件从$spark_HOME复制到每个节点(在我的例子中,它只是一个节点)。我看到代码的执行在完成复制之前暂停了一段时间。见下文-

16/12/12 17:24:03 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/12/12 17:24:06 INFO yarn.Client: Uploading resource file:/tmp/spark-a6cc0d6e-45f9-4712-8bac-fb363d6992f2/__spark_libs__11112433502351931.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/__spark_libs__11112433502351931.zip
16/12/12 17:24:08 INFO yarn.Client: Uploading resource file:/home/sanjeevd/personal/Spark-Simple/target/SparkSimple-0.0.1-SNAPSHOT.jar -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/SparkSimple-0.0.1-SNAPSHOT.jar
16/12/12 17:24:08 INFO yarn.Client: Uploading resource file:/tmp/spark-a6cc0d6e-45f9-4712-8bac-fb363d6992f2/__spark_conf__6716604236006329155.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/__spark_conf__.zip
Spark的文档建议设置
Spark.warn.jars
属性以避免这种复制。所以我在
spark defaults.conf
文件中设置了以下属性

spark.yarn.jars hdfs://sanjeevd.br:9000//user/spark/share/lib
要使Spark runtime jars从纱线端可访问,可以指定Spark.Thread.archive或Spark.Thread.jars。有关详细信息,请参阅Spark Properties。如果既没有指定spark.Thread.archive也没有指定spark.Thread.jars,spark将创建一个包含$spark_HOME/jars下所有jar的zip文件,并将其上载到分布式缓存

顺便说一句,我有所有的jar文件,从LOCAL
/opt/spark/jars
到HDFS
/user/spark/share/lib
。他们有206人

这使我的罐子坏了。下面是错误-

spark-submit --class com.sanjeevd.sparksimple.wordcount.JobRunner --master yarn --deploy-mode cluster --driver-memory=2g --executor-memory 2g --executor-cores 1 --num-executors 1 SparkSimple-0.0.1-SNAPSHOT.jar hdfs://sanjeevd.br:9000/user/spark-test/word-count/input hdfs://sanjeevd.br:9000/user/spark-test/word-count/output
16/12/12 17:43:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/12 17:43:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/12/12 17:43:07 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/12/12 17:43:07 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (5120 MB per container)
16/12/12 17:43:07 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
16/12/12 17:43:07 INFO yarn.Client: Setting up container launch context for our AM
16/12/12 17:43:07 INFO yarn.Client: Setting up the launch environment for our AM container
16/12/12 17:43:07 INFO yarn.Client: Preparing resources for our AM container
16/12/12 17:43:07 INFO yarn.Client: Uploading resource file:/home/sanjeevd/personal/Spark-Simple/target/SparkSimple-0.0.1-SNAPSHOT.jar -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005/SparkSimple-0.0.1-SNAPSHOT.jar
16/12/12 17:43:07 INFO yarn.Client: Uploading resource file:/tmp/spark-fae6a5ad-65d9-4b64-9ba6-65da1310ae9f/__spark_conf__7881471844385719101.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005/__spark_conf__.zip
16/12/12 17:43:08 INFO spark.SecurityManager: Changing view acls to: sanjeevd
16/12/12 17:43:08 INFO spark.SecurityManager: Changing modify acls to: sanjeevd
16/12/12 17:43:08 INFO spark.SecurityManager: Changing view acls groups to: 
16/12/12 17:43:08 INFO spark.SecurityManager: Changing modify acls groups to: 
16/12/12 17:43:08 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(sanjeevd); groups with view permissions: Set(); users  with modify permissions: Set(sanjeevd); groups with modify permissions: Set()
16/12/12 17:43:08 INFO yarn.Client: Submitting application application_1481592214176_0005 to ResourceManager
16/12/12 17:43:08 INFO impl.YarnClientImpl: Submitted application application_1481592214176_0005
16/12/12 17:43:09 INFO yarn.Client: Application report for application_1481592214176_0005 (state: ACCEPTED)
16/12/12 17:43:09 INFO yarn.Client: 
 client token: N/A
 diagnostics: N/A
 ApplicationMaster host: N/A
 ApplicationMaster RPC port: -1
 queue: default
 start time: 1481593388442
 final status: UNDEFINED
 tracking URL: http://sanjeevd.br:8088/proxy/application_1481592214176_0005/
 user: sanjeevd
16/12/12 17:43:10 INFO yarn.Client: Application report for application_1481592214176_0005 (state: FAILED)
16/12/12 17:43:10 INFO yarn.Client: 
 client token: N/A
 diagnostics: Application application_1481592214176_0005 failed 1 times due to AM Container for appattempt_1481592214176_0005_000001 exited with  exitCode: 1
For more detailed output, check application tracking page:http://sanjeevd.br:8088/cluster/app/application_1481592214176_0005Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1481592214176_0005_01_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
    at org.apache.hadoop.util.Shell.run(Shell.java:456)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1481593388442
     final status: FAILED
     tracking URL: http://sanjeevd.br:8088/cluster/app/application_1481592214176_0005
     user: sanjeevd
16/12/12 17:43:10 INFO yarn.Client: Deleting staging directory hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005
Exception in thread "main" org.apache.spark.SparkException: Application application_1481592214176_0005 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/12/12 17:43:10 INFO util.ShutdownHookManager: Shutdown hook called
16/12/12 17:43:10 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-fae6a5ad-65d9-4b64-9ba6-65da1310ae9f
你知道我做错了什么吗?任务日志如下所示-

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
我理解没有找到ApplicationMaster类的错误,但我的问题是为什么找不到它-这个类应该在哪里?我没有组装jar,因为我使用的是spark 2.0.1,其中没有打包的组件

这与
spark.warn.jars
属性有什么关系?这一特性有助于火花在纱线上运行,这就是它。使用
spark.warn.jars
时,我还需要做什么


感谢您阅读此问题并提前提供帮助。

如果您查看spark.Thread.jars文档,它会显示以下内容

包含要分发到纱线容器的Spark代码的库的列表。默认情况下,Spark on Thread将使用本地安装的Spark jar,但Spark jar也可以位于HDFS上的世界可读位置。这允许Thread将其缓存在节点上,这样就不需要在应用程序每次运行时都将其分发。例如,要指向HDFS上的JAR,请将此配置设置为hdfs:///some/path. 地球仪是允许的


这意味着您实际上正在覆盖SPARK_HOME/jar,并告诉Thread从您的路径中获取应用程序运行所需的所有jar,如果设置SPARK.Thread.jars属性,则SPARK运行的所有依赖jar都应该出现在该路径中,如果您查看SPARK_HOME/lib中的SPARK-assembly.jar,
org.apache.spark.deploy.Thread.ApplicationMaster类存在,因此请确保所有spark依赖项都存在于指定为spark.Thread.jars的HDFS路径中。

我终于能够理解此属性。我通过hit-n-trial发现这个属性的正确语法是

火花、纱线、罐子=hdfs://xx:9000/user/spark/share/lib/*jar先生

我没有把
*.jar
放在最后,我的路径只是以/lib结束。我试着像这样放置实际的程序集jar-
spark.warn.jars=hdfs://sanjeevd.brickred:9000/user/spark/share/lib/spark-纱线_2.11-2.0.1.jar
但运气不佳。上面说的是无法加载ApplicationMaster


我在

上发布了我对某人提出的类似问题的回答。您也可以使用
spark.warn.archive
选项,并将其设置为包含存档根目录下
$spark\u HOME/JARs/
文件夹中所有jar的存档(您创建的)位置。例如:

  • 创建归档文件:
    jar cv0f spark-libs.jar-C$spark\u HOME/jars/
  • 上传到HDFS:
    HDFS dfs-put spark-libs.jar/some/path/

    2a。对于大型群集,请增加Spark归档的复制计数,以便减少节点管理员执行远程复制的次数<代码>hdfs dfs–setrep-w 10hdfs:///some/path/spark-libs.jar
  • (更改与总节点管理员数量成比例的副本数量)
  • spark.warn.archive
    设置为
    hdfs:///some/path/spark-libs.jar

  • 谢谢我在最后修改了我的问题。因为我使用的是Spark 2.0.1,所以没有组装jar。所以我找不到ApplicationMasterJava类。为什么spark在我取消设置spark.Thread.jars属性时不抱怨?当我将所有/spark/jar上传到HDFS并将spark.warn.jars属性设置为指向此HDFS位置时,spark就发疯了,并请求ApplicationMaster。顺便说一句,我也没有/spark/lib文件夹。我想他们在2.x版本中也改变了。请从Spark 2.X获得任何帮助,他们已经停止创建程序集jar,如果您在/jars文件夹中查找,您将找到Spark-Jarn_-.jar,它应该包含ApplicationMaster类,请验证您在/jars文件夹中是否有此jar。如果您拥有它并且已将其复制到HDFS位置,则我不知道您为什么会出现此错误。:)谢谢你的帮助。我增加了你的评论;看起来我在指定此属性时遇到语法问题。嗨,Sanjeev,在我的例子中,只有$SPAKR_HOME$/jars中的jar被复制。您如何制作自己的jar,即
    SparkSimple-0.0.1SNAPSHOT.jar
    也复制到hdfs?这是“spark defaults.conf”的一部分吗?jar文件在hdfs中已经可用了吗?