Apache spark spark2+;纱线-准备AM容器时出现空点异常
我想跑Apache spark spark2+;纱线-准备AM容器时出现空点异常,apache-spark,pyspark,yarn,hadoop2,Apache Spark,Pyspark,Yarn,Hadoop2,我想跑 pyspark --master yarn Spark版本:2.0.0 Hadoop版本:2.7.2 Hadoop纱线web界面是 成功启动 情况就是这样: 16/08/15 10:00:12 DEBUG Client: Using the default MR application classpath: $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/
pyspark --master yarn
- Spark版本:2.0.0
- Hadoop版本:2.7.2
- Hadoop纱线web界面是 成功启动
16/08/15 10:00:12 DEBUG Client: Using the default MR application classpath: $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
16/08/15 10:00:12 INFO Client: Preparing resources for our AM container
16/08/15 10:00:12 DEBUG Client:
16/08/15 10:00:12 DEBUG DFSClient: /user/mispp/.sparkStaging/application_1471254869164_0006: masked=rwxr-xr-x
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #8
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #8
16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: mkdirs took 14ms
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #9
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #9
16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: setPermission took 10ms
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #10
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #10
16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: getFileInfo took 2ms
16/08/15 10:00:12 INFO Client: Deleting staging directory hdfs://sm/user/mispp/.sparkStaging/application_1471254869164_0006
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #11
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #11
16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: delete took 14ms
16/08/15 10:00:12 ERROR SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
at scala.collection.mutable.ArrayOps$ofRef$.newBuilder$extension(ArrayOps.scala:190)
at scala.collection.mutable.ArrayOps$ofRef.newBuilder(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:246)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:186)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:484)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:480)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:480)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
16/08/15 10:00:12 DEBUG AbstractLifeCycle: stopping org.spark_project.jetty.server.Server@69e507eb
16/08/15 10:00:12 DEBUG Server: Graceful shutdown org.spark_project.jetty.server.Server@69e507eb by
知道为什么会这样吗?
它安装在3个LXD容器(主计算机+两台计算机)中,安装在一台16GB ram的服务器上。当您看到
错误SparkContext:初始化SparkContext时出错
当使用纱线时,这意味着Spark应用程序无法启动,因为它无法获得足够的资源(通常是内存)。所以这是你需要检查的第一件事
您可以将spark defaults.conf
粘贴到此处。或者如果您没有注意到,spark.executor.memory
的默认值是1g
。您可以尝试覆盖此值,例如
pyspark --executor-memory 256m
看看它是否开始
另外,
warn site.xml
中没有资源配置(例如,warn.nodemanager.resource.memory mb
),因此您可能没有为warn分配足够的资源。考虑到机器的大小,最好对这些值进行解释。考虑到Spark 2.0.0代码中错误的位置:
我怀疑发生错误的原因是
spark.warn.jars
的配置错误。根据位于的文档,我将再次检查您的设置中此配置的值是否正确。我刚刚添加了@tinfoiled答案,但我想在这里对spark.warn.jars
(它以“s”结尾)属性的语法进行评论,因为我花了相当长的时间才弄清楚
正确的语法(OP已经知道)是-
实际上,我没有在最后放入*.jar,结果导致“无法加载ApplicationMaster”。我尝试了各种组合,但都不起作用。事实上,我在SOF上发布了一个关于同一问题的问题
我甚至不确定我所做的是否正确,但OP的问题和@tinfoiled的回答给了我一些信心,我终于能够利用这个属性。是的,纱线引起了问题。hdfs上包括了什么。这是没有回应的。
export HADOOP_PREFIX=/home/mispp/hadoop-2.7.2
export PATH=$PATH:$HADOOP_PREFIX/bin
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
pyspark --executor-memory 256m
spark.yarn.jars=hdfs://xxx:9000/user/spark/share/lib/*.jar