Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/multithreading/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Multithreading giraph.numInputThreads执行“的时间”;“输入超步”;它';使用1个或8个线程也是一样的,这怎么可能呢?_Multithreading_Amazon Web Services_Amazon Ec2_Giraph - Fatal编程技术网

Multithreading giraph.numInputThreads执行“的时间”;“输入超步”;它';使用1个或8个线程也是一样的,这怎么可能呢?

Multithreading giraph.numInputThreads执行“的时间”;“输入超步”;它';使用1个或8个线程也是一样的,这怎么可能呢?,multithreading,amazon-web-services,amazon-ec2,giraph,Multithreading,Amazon Web Services,Amazon Ec2,Giraph,我正在通过维基百科(西班牙语版)网站进行BFS搜索。我将文件转换成一个可以用Giraph读取的文件 使用1个worker,1 GB的文件需要452秒。我用这个命令执行了Giraph: /home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote -vif ar.edu.info.unlp.

我正在通过维基百科(西班牙语版)网站进行BFS搜索。我将文件转换成一个可以用Giraph读取的文件

使用1个worker,1 GB的文件需要452秒。我用这个命令执行了Giraph:

/home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote -vif ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueInputFormat -vip /user/hduser/input/grafo-wikipedia.txt -vof ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueOutputFormat -op /user/hduser/output/caminosNavegacionales -w 1 -yh 120000 -ca giraph.metrics.enable=true,giraph.useOutOfCoreMessages=true
容器日志:

16/08/24 21:17:02 INFO master.BspServiceMaster: generateVertexInputSplits: Got 8 input splits for 1 input threads
16/08/24 21:17:02 INFO master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
16/08/24 21:17:02 INFO master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
16/08/24 21:17:02 INFO yarn.GiraphYarnTask: [STATUS: task-0] MASTER_ZOOKEEPER_ONLY checkWorkers: Done - Found 1 responses of 1 needed to start superstep -1
16/08/24 21:17:02 INFO netty.NettyClient: Using Netty without authentication.
16/08/24 21:17:02 INFO netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
16/08/24 21:17:02 INFO partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
...
16/08/24 21:25:40 INFO netty.NettyClient: stop: Halting netty client
16/08/24 21:25:40 INFO netty.NettyClient: stop: reached wait threshold, 1 connections closed, releasing resources now.
16/08/24 21:25:43 INFO netty.NettyClient: stop: Netty client halted
16/08/24 21:25:43 INFO netty.NettyServer: stop: Halting netty server
16/08/24 21:25:43 INFO netty.NettyServer: stop: Start releasing resources
16/08/24 21:25:44 INFO bsp.BspService: process: cleanedUpChildrenChanged signaled
16/08/24 21:25:47 INFO netty.NettyServer: stop: Netty server halted
16/08/24 21:25:47 INFO bsp.BspService: process: masterElectionChildrenChanged signaled
16/08/24 21:25:47 INFO master.MasterThread: setup: Took 0.898 seconds.
16/08/24 21:25:47 INFO master.MasterThread: input superstep: Took 452.531 seconds.
16/08/24 21:25:47 INFO master.MasterThread: superstep 0: Took 64.376 seconds.
16/08/24 21:25:47 INFO master.MasterThread: superstep 1: Took 1.591 seconds.
16/08/24 21:25:47 INFO master.MasterThread: shutdown: Took 6.609 seconds.
16/08/24 21:25:47 INFO master.MasterThread: total: Took 526.006 seconds.
正如你们所看到的,第一行告诉我们,input superstep只使用一个线程执行。在完成输入超步时,花了492秒

我做了另一个测试,使用giraph.numInputThreads=8,尝试用8个线程执行输入超步:

/home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote -vif ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueInputFormat -vip /user/hduser/input/grafo-wikipedia.txt -vof ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueOutputFormat -op /user/hduser/output/caminosNavegacionales -w 1 -yh 120000 -ca giraph.metrics.enable=true,giraph.useOutOfCoreMessages=true,giraph.numInputThreads=8
结果如下:

    16/08/24 21:54:00 INFO master.BspServiceMaster: generateVertexInputSplits: Got 8 input splits for 8 input threads
16/08/24 21:54:00 INFO master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
16/08/24 21:54:00 INFO master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
...

16/08/24 22:10:07 INFO master.MasterThread: setup: Took 0.093 seconds.
16/08/24 22:10:07 INFO master.MasterThread: input superstep: Took 891.339 seconds.
16/08/24 22:10:07 INFO master.MasterThread: superstep 0: Took 66.635 seconds.
16/08/24 22:10:07 INFO master.MasterThread: superstep 1: Took 1.837 seconds.
16/08/24 22:10:07 INFO master.MasterThread: shutdown: Took 6.605 seconds.
16/08/24 22:10:07 INFO master.MasterThread: total: Took 966.512 seconds.
所以,我的问题是,Giraph怎么可能在没有输入线程的情况下使用452秒,在有输入线程的情况下使用891秒?应该正好相反,对吧


用于此操作的群集为1个主群集和1个从群集,均为AWS上的r3.8XL EC2实例。

问题与HDFS访问有关。有8个线程访问同一资源,而该资源只能以顺序方式访问。为了获得最佳性能,giraph.numInputThreads应为2或1。

问题与HDFS访问有关。有8个线程访问同一资源,而该资源只能以顺序方式访问。为了获得最佳性能,giraph.numInputThreads应为2或1