C++ MPI parcelport在HPX中的性能

C++ MPI parcelport在HPX中的性能,c++,mpi,hpx,C++,Mpi,Hpx,我正在HPX中执行一些有关远程通信的简单测试,编译后的parcelports基于MPI。 我面临着一些关于通信带宽和延迟的问题 测试通过以下简单代码执行: #include <iostream> #include <chrono> #include <hpx/hpx_main.hpp> #include <hpx/include/components.hpp> #include <hpx/include/actions.hpp> #in

我正在HPX中执行一些有关远程通信的简单测试,编译后的parcelports基于MPI。 我面临着一些关于通信带宽和延迟的问题

测试通过以下简单代码执行:

#include <iostream>
#include <chrono>
#include <hpx/hpx_main.hpp>
#include <hpx/include/components.hpp>
#include <hpx/include/actions.hpp>
#include <hpx/include/iostreams.hpp>

class block
      : public hpx::components::component_base<block>
    {
    public:

    block(std::size_t size) : data_(size,1){ } 
    std::vector<double> get_data(){return data_;}
    int pingpong(){return 1;}

    HPX_DEFINE_COMPONENT_ACTION(block, get_data, get_data_action);
    HPX_DEFINE_COMPONENT_ACTION(block, pingpong, pingpong_action);

    private:    
    std::vector<double> data_; 

   };

typedef hpx::components::component<block>   block_type;
typedef block::get_data_action          block__get_data_action;
typedef block::pingpong_action          block__pingpong_action;

HPX_REGISTER_COMPONENT(block_type, block);
HPX_REGISTER_ACTION(block::get_data_action, block__get_data_action);
HPX_REGISTER_ACTION(block::pingpong_action, block__pingpong_action);


////////////////////////////////////////////////////////////////////
int main(){
    std::vector<hpx::id_type> locs = hpx::find_all_localities();

    std::size_t minsize=1e3;
    std::size_t maxsize=1e8;
    std::size_t ntries = 100;

    block__get_data_action     act_data;
    block__pingpong_action     act_pingpong;

    for(std::size_t size = minsize; size<=maxsize; size*=2){
        hpx::id_type remote_block = hpx::new_<block_type>(locs[1], size).get();
        double Mb_size=size*sizeof(double)/1.e6;

        hpx::cout << "Size = " << Mb_size << " MB.";  

        //---------------- Bandwidth ------------------

        double seconds_bandwidth=0;
        std::vector<double>  buffer(size);
        for(int i=0; i<ntries; i++){
            auto t = std::chrono::high_resolution_clock::now();
            buffer = act_data(remote_block);
            auto elapsed = std::chrono::high_resolution_clock::now() - t;
            seconds_bandwidth+=std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count()/1.e6;
        }   
        seconds_bandwidth/=double(ntries);
        hpx::cout << "\t Bandwidth = " << Mb_size/seconds_bandwidth << " MB/s.";     

        //---------------- PingPong ------------------

        double microseconds_pingpong=0;
        int intbuffer=0;
        for(int i=0; i<ntries; i++){
            auto t = std::chrono::high_resolution_clock::now();
            intbuffer=act_pingpong(remote_block);
            auto elapsed = std::chrono::high_resolution_clock::now() - t;
            microseconds_pingpong+=std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
        }
        microseconds_pingpong/=double(ntries);
        hpx::cout << "\t PingPong = " << microseconds_pingpong << " microseconds. " << hpx::endl;
    }

  return 0;
}
我觉得奇怪的是:

  • 带宽非常不稳定,从100 MB/s左右到1000 MB/s以上都有振荡
  • 相对于此网络上的预期值,带宽非常低。这个示例在纯MPI中的实现证实了这一点,它提供了至少10倍的带宽
  • 乒乓球的时间真的很长,而且比纯MPI乒乓球高出大约20倍。实际上,这里乒乓时间是指调用返回整数的远程操作所需的时间,而纯MPI实现包括发送和接收整数
因此,我有以下问题:

  • 低带宽是HPX中MPI parcelport的正常行为吗?若有,原因为何?如果没有,可能的原因是什么
  • 用HPX测量乒乓球时间公平吗?是否可以与纯MPI实现进行比较
  • 高乒乓球时间正常吗
如果您需要更多信息,例如有关HPX编译和配置的信息,请随时询问。 这几天我正在接近HPX,所以我可能在我的代码中犯了一些错误或是一些非最优的东西(很抱歉)

非常感谢

Size = 0.008 MB.         Bandwidth = 30.2058 MB/s.       PingPong = 157.67 microseconds. 
Size = 0.016 MB.         Bandwidth = 75.5929 MB/s.       PingPong = 143.98 microseconds. 
Size = 0.032 MB.         Bandwidth = 143.639 MB/s.       PingPong = 153.12 microseconds. 
Size = 0.064 MB.         Bandwidth = 256.966 MB/s.       PingPong = 142 microseconds. 
Size = 0.128 MB.         Bandwidth = 343.744 MB/s.       PingPong = 148.17 microseconds. 
Size = 0.256 MB.         Bandwidth = 389.371 MB/s.       PingPong = 143.38 microseconds. 
Size = 0.512 MB.         Bandwidth = 618.589 MB/s.       PingPong = 153.1 microseconds. 
Size = 1.024 MB.         Bandwidth = 821.764 MB/s.       PingPong = 148.94 microseconds. 
Size = 2.048 MB.         Bandwidth = 1003.29 MB/s.       PingPong = 146.17 microseconds. 
Size = 4.096 MB.         Bandwidth = 201.063 MB/s.       PingPong = 158.39 microseconds. 
Size = 8.192 MB.         Bandwidth = 91.1075 MB/s.       PingPong = 153.49 microseconds. 
Size = 16.384 MB.        Bandwidth = 1655.55 MB/s.       PingPong = 147.72 microseconds. 
Size = 32.768 MB.        Bandwidth = 407.986 MB/s.       PingPong = 151.03 microseconds. 
Size = 65.536 MB.        Bandwidth = 427.471 MB/s.       PingPong = 149.75 microseconds. 
Size = 131.072 MB.       Bandwidth = 295.531 MB/s.       PingPong = 147.37 microseconds. 
Size = 262.144 MB.       Bandwidth = 513.221 MB/s.       PingPong = 146.4 microseconds. 
Size = 524.288 MB.       Bandwidth = 708.265 MB/s.       PingPong = 147.14 microseconds.