Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 如何从头开始设置所需的配置以在纱线上运行spark?_Hadoop_Apache Spark_Yarn_Hpc - Fatal编程技术网

Hadoop 如何从头开始设置所需的配置以在纱线上运行spark?

Hadoop 如何从头开始设置所需的配置以在纱线上运行spark?,hadoop,apache-spark,yarn,hpc,Hadoop,Apache Spark,Yarn,Hpc,我是在纱线上运行spark的新手,我已经在HPC集群上进行了尝试。由于我无法正确设置配置,因此无法成功连接到纱线,我的日志如下: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/11/22 18:35:59 INFO SparkContext: Running Spark version 1.4.1 16/11/22 18:36:00 WARN NativeCodeLoader:

我是在纱线上运行spark的新手,我已经在HPC集群上进行了尝试。由于我无法正确设置配置,因此无法成功连接到纱线,我的日志如下:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/11/22 18:35:59 INFO SparkContext: Running Spark version 1.4.1
16/11/22 18:36:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/22 18:36:00 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
16/11/22 18:36:00 INFO SecurityManager: Changing view acls to:  
16/11/22 18:36:00 INFO SecurityManager: Changing modify acls to:  
16/11/22 18:36:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set( ); users with modify permissions: Set( )
16/11/22 18:36:01 INFO Slf4jLogger: Slf4jLogger started
16/11/22 18:36:01 INFO Remoting: Starting remoting
16/11/22 18:36:01 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@172.23.9.106:35138]
16/11/22 18:36:01 INFO Utils: Successfully started service 'sparkDriver' on port 35138.
16/11/22 18:36:01 INFO SparkEnv: Registering MapOutputTracker
16/11/22 18:36:01 INFO SparkEnv: Registering BlockManagerMaster
16/11/22 18:36:01 INFO DiskBlockManager: Created local directory at /local/tmp/spark-c335cd9f-7719-4ffd-942c-2bbcc543e3c8/blockmgr-2057f8aa-e074-4910-a1fa-3fc9c73ad1f1
16/11/22 18:36:01 INFO MemoryStore: MemoryStore started with capacity 16.6 GB
16/11/22 18:36:01 INFO HttpFileServer: HTTP File server directory is /local/tmp/spark-c335cd9f-7719-4ffd-942c-2bbcc543e3c8/httpd-63876231-4681-4e75-bfd9-f46edf02087c
16/11/22 18:36:01 INFO HttpServer: Starting HTTP Server
16/11/22 18:36:01 INFO Utils: Successfully started service 'HTTP file server' on port 46754.
16/11/22 18:36:01 INFO SparkEnv: Registering OutputCommitCoordinator
16/11/22 18:36:01 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/11/22 18:36:01 INFO SparkUI: Started SparkUI at http://172.23.9.106:4040
16/11/22 18:36:02 INFO SparkContext: Added JAR file:/hard-mounts/user/311/ /cosine-lsh.jar at http://172.23.9.106:46754/jars/cosine-lsh.jar with timestamp 1479836162928
16/11/22 18:36:03 INFO RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
16/11/22 18:36:04 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:05 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:06 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:07 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:08 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:09 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:10 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:11 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:12 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
#!/bin/bash -l

#PBS -l walltime=00:45:00

module load Spark/1.4.1
module load Hadoop

cd $PBS_O_WORKDIR

export HADOOP_CONF_DIR=/vsc-hard-mounts/user/311/31182/hadoop-2.6.0-src/hadoop-yarn-project/hadoop-yarn/conf

spark-submit \
  --class com.soundcloud.lsh.MainCerebro \
  --master yarn-client \
  --num-executors 50 \
  --driver-memory 32g \
  --executor-memory 32g \
  --executor-cores 2 \
  cosine-lsh.jar
我使用spark 1.4.1,并已将Hadoop 2.6下载到集群中的本地文件夹中。以下是我的应用程序代码中的spark配置行:

val conf = new SparkConf()
      .setAppName("LSH-Cosine")
      .setMaster("yarn-client") //local[*]
      .set("spark.driver.maxResultSize", "0")
      .set("spark.local.dir", "/local/tmp");
我通过bash文件通过jar文件运行代码,如下所示:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/11/22 18:35:59 INFO SparkContext: Running Spark version 1.4.1
16/11/22 18:36:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/22 18:36:00 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
16/11/22 18:36:00 INFO SecurityManager: Changing view acls to:  
16/11/22 18:36:00 INFO SecurityManager: Changing modify acls to:  
16/11/22 18:36:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set( ); users with modify permissions: Set( )
16/11/22 18:36:01 INFO Slf4jLogger: Slf4jLogger started
16/11/22 18:36:01 INFO Remoting: Starting remoting
16/11/22 18:36:01 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@172.23.9.106:35138]
16/11/22 18:36:01 INFO Utils: Successfully started service 'sparkDriver' on port 35138.
16/11/22 18:36:01 INFO SparkEnv: Registering MapOutputTracker
16/11/22 18:36:01 INFO SparkEnv: Registering BlockManagerMaster
16/11/22 18:36:01 INFO DiskBlockManager: Created local directory at /local/tmp/spark-c335cd9f-7719-4ffd-942c-2bbcc543e3c8/blockmgr-2057f8aa-e074-4910-a1fa-3fc9c73ad1f1
16/11/22 18:36:01 INFO MemoryStore: MemoryStore started with capacity 16.6 GB
16/11/22 18:36:01 INFO HttpFileServer: HTTP File server directory is /local/tmp/spark-c335cd9f-7719-4ffd-942c-2bbcc543e3c8/httpd-63876231-4681-4e75-bfd9-f46edf02087c
16/11/22 18:36:01 INFO HttpServer: Starting HTTP Server
16/11/22 18:36:01 INFO Utils: Successfully started service 'HTTP file server' on port 46754.
16/11/22 18:36:01 INFO SparkEnv: Registering OutputCommitCoordinator
16/11/22 18:36:01 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/11/22 18:36:01 INFO SparkUI: Started SparkUI at http://172.23.9.106:4040
16/11/22 18:36:02 INFO SparkContext: Added JAR file:/hard-mounts/user/311/ /cosine-lsh.jar at http://172.23.9.106:46754/jars/cosine-lsh.jar with timestamp 1479836162928
16/11/22 18:36:03 INFO RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
16/11/22 18:36:04 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:05 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:06 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:07 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:08 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:09 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:10 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:11 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/11/22 18:36:12 INFO Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
#!/bin/bash -l

#PBS -l walltime=00:45:00

module load Spark/1.4.1
module load Hadoop

cd $PBS_O_WORKDIR

export HADOOP_CONF_DIR=/vsc-hard-mounts/user/311/31182/hadoop-2.6.0-src/hadoop-yarn-project/hadoop-yarn/conf

spark-submit \
  --class com.soundcloud.lsh.MainCerebro \
  --master yarn-client \
  --num-executors 50 \
  --driver-memory 32g \
  --executor-memory 32g \
  --executor-cores 2 \
  cosine-lsh.jar
我试图将HADOOP_CONF_DIR指定到HADOOP 2.6中的CONF文件夹中,该文件夹已下载到本地文件夹中。此外,我将warn-site.xml更改为

<configuration>

<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>127.0.0.1:8032</value>
</property>

</configuration>

warn.resourcemanager.hostname
127.0.0.1:8032
我已经从相关的stackoverflow帖子中完成了以上所有的测试。由于他们中的许多人没有明确的答案,我想发布这个问题


提前感谢。

您有运行hadoop守护进程吗?您的spark似乎无法连接到资源管理器。您是否正在运行hadoop守护进程?您的spark似乎无法连接到资源管理器。