Apache spark 在本地主机上运行的Spark BlockManager
我有一个简单的脚本文件,我正试图在模拟教程的spark shell中执行 我的namenode和mesos主机位于172.24.51.171,我的ip地址是172.24.51.142。我已将这些行保存到一个文件中,然后使用以下命令启动该文件:Apache spark 在本地主机上运行的Spark BlockManager,apache-spark,Apache Spark,我有一个简单的脚本文件,我正试图在模拟教程的spark shell中执行 我的namenode和mesos主机位于172.24.51.171,我的ip地址是172.24.51.142。我已将这些行保存到一个文件中,然后使用以下命令启动该文件: /opt/spark-1.3.0-bin-hadoop2.4/bin/spark-shell -i WordCount.scala 我的远程执行器都会因以下类似错误而死亡: 15/04/08 14:30:39 ERROR RetryingBlockFet
/opt/spark-1.3.0-bin-hadoop2.4/bin/spark-shell -i WordCount.scala
我的远程执行器都会因以下类似错误而死亡:
15/04/08 14:30:39 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to localhost/127.0.0.1:48554
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:87)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:89)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:594)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:592)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:592)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:586)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.org$apache$spark$broadcast$TorrentBroadcast$$anonfun$$getRemote$1(TorrentBroadcast.scala:126)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply(TorrentBroadcast.scala:136)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply(TorrentBroadcast.scala:136)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:48554
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more
此失败发生在我运行errors.count()命令之后。在我的shell的前面,在创建新的SparkContext后,我看到以下行:
15/04/08 14:31:18 INFO NettyBlockTransferService: Server created on 48554
15/04/08 14:31:18 INFO BlockManagerMaster: Trying to register BlockManager
15/04/08 14:31:18 INFO BlockManagerMasterActor: Registering block manager localhost:48554 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 48554)
15/04/08 14:31:18 INFO BlockManagerMaster: Registered BlockManager
15/04/08 14:31:18信息NettyBlockTransferService:在48554上创建的服务器
15/04/08 14:31:18信息BlockManager管理员:正在尝试注册BlockManager
15/04/08 14:31:18信息BlockManagerMasterActor:使用265.4 MB RAM注册块管理器localhost:48554,BlockManagerId(,localhost,48554)
15/04/08 14:31:18信息BlockManager管理员:已注册的BlockManager
我猜发生的是Spark将BlockManager的地址记录为localhost:48554,然后发送给所有试图与其localhost:48554对话的执行者,而不是端口48554上的驱动程序ip地址为什么spark使用localhost作为BlockManager的地址而不是spark.driver.host?
补充资料
尝试通过
sparkConf
对象设置SPARK\u LOCAL\u IP
(在命令行上)或SPARK.LOCAL.IP
。在调用SPARK shell(或添加SPARK defaults.conf)时,可以尝试使用--Master参数提供SPARK主地址。我有一个类似的问题(见我的帖子),当在shell中动态创建上下文时,BlockManager似乎会在localhost上侦听
日志:
- 使用原始上下文时(侦听主机名) BlockManagerInfo:在ubuntu64server2:33301的内存中添加了广播\u 1\u片段0
- 创建新上下文时(在本地主机上侦听) BlockManagerInfo:在本地主机40235上的内存中添加了广播\u 1\u片段0
我必须连接到Cassandra群集,并且能够通过在spark-defaults.conf中提供spark.Cassandra.connection.host并在spark shell中导入com.datasax.spark.connector.u.Hi maasg--根据配置页面,没有效果,spark.local.ip不是spark 1.2.1或spark 1.3.0中配置的一部分。请参阅我的问题中的链接“当在shell中动态创建上下文时,BlockManager在本地主机上侦听”。这也是我的结论,但似乎是违反直觉的行为。在spark-defaults.conf中设置spark.master为我解决了这个问题
15/04/08 14:31:18 INFO NettyBlockTransferService: Server created on 48554
15/04/08 14:31:18 INFO BlockManagerMaster: Trying to register BlockManager
15/04/08 14:31:18 INFO BlockManagerMasterActor: Registering block manager localhost:48554 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 48554)
15/04/08 14:31:18 INFO BlockManagerMaster: Registered BlockManager