Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
取消阻止发送和接收时MPI崩溃_Mpi - Fatal编程技术网

取消阻止发送和接收时MPI崩溃

取消阻止发送和接收时MPI崩溃,mpi,Mpi,我用MPI编写了一个解算器,但当集群上的任务数大于80时,它会崩溃或挂起。有时,它会崩溃,有时会随着测试代码的更改和任务编号的更改而挂起。起初我认为在失败点之前可能会有一些内存泄漏,从而导致失败。但是经过一些测试,我发现即使我只在解算器开始时进行简单的数据传输,它也会崩溃。这一次它只是在没有挂断的情况下崩溃 我的问题是: 子例程TestMPI()中是否存在导致崩溃的错误 如果第一个问题的答案是否定的,那么这次崩溃的可能原因是什么 谢谢 我将解算器连接如下,数据传输的功能是: void TestM

我用MPI编写了一个解算器,但当集群上的任务数大于80时,它会崩溃或挂起。有时,它会崩溃,有时会随着测试代码的更改和任务编号的更改而挂起。起初我认为在失败点之前可能会有一些内存泄漏,从而导致失败。但是经过一些测试,我发现即使我只在解算器开始时进行简单的数据传输,它也会崩溃。这一次它只是在没有挂断的情况下崩溃

我的问题是:

  • 子例程TestMPI()中是否存在导致崩溃的错误

  • 如果第一个问题的答案是否定的,那么这次崩溃的可能原因是什么

  • 谢谢

    我将解算器连接如下,数据传输的功能是:

    void TestMPI()
    {
        int check = 1;
        MPI_Bcast(&check, 1, MPI_INT, 0, MPI_COMM_WORLD);  
    
        int N_region = 0;
        MPI_Comm_size(MPI_COMM_WORLD, &N_region);
        int mpi_Rank=0;
        MPI_Comm_rank(MPI_COMM_WORLD, &mpi_Rank);
    
        MPI_Request * reqSend_test = new MPI_Request [N_region];
        MPI_Request * reqRecv_test = new MPI_Request [N_region];
        MPI_Status * status_test = new MPI_Status [N_region];
    
        MPI_Status statRecv;
    
        int cnt_test(0);
        double * Read_test = new double[N_region];
        double * Send_test = new double[N_region];
        for (int ii=0; ii<N_region; ii++)
        {
            if (ii == mpi_Rank)
                continue;
            int tag = (ii) * N_region + mpi_Rank;
            MPI_Irecv(&Read_test[ii],1,MPI_DOUBLE,ii,tag,MPI_COMM_WORLD,&reqRecv_test[ii]);
            cnt_test++;
        }
    
        cnt_test = 0;
        for (int ii=0; ii<N_region; ii++)
        {
            if (ii == mpi_Rank)
                continue;
    
            Send_test[ii] = mpi_Rank;
            int tag = (mpi_Rank)*N_region + ii;
            MPI_Isend(&Send_test[ii],1,MPI_DOUBLE,ii,tag,MPI_COMM_WORLD,&reqSend_test[ii]);
            cnt_test++;
        }
    
        //MPI_Waitall(N_region-1, reqSend_test, status_test);
    
        char fname [80];
        sprintf(fname, "TestMPI_result%d", mpi_Rank);
        FILE * stream = fopen(fname,"w");
    
        fprintf(stream, "After MPI_Waitall send\n");
        fflush(stream);
    
        for (int ii=0; ii<N_region; ii++)
        {
            if (ii == mpi_Rank)
                continue;
            MPI_Wait(&reqSend_test[ii], &statRecv);
            fprintf(stream, "After Wait Send %d\n", ii);
            fflush(stream);
        }
    
        for (int ii=0; ii<N_region; ii++)
        {
            if (ii == mpi_Rank)
                continue;
            MPI_Wait(&reqRecv_test[ii], &statRecv);
            fprintf(stream, "After Wait Recv %d\n", ii);
            fflush(stream);
        }
    
        fprintf(stream, "After Start Test\n");
        fflush(stream);
    
        //MPI_Waitall(N_region-1, reqRecv_test, status_test);
    
        MPI_Bcast(&check, 1, MPI_INT, 0, MPI_COMM_WORLD);  
    
        fprintf(stream, "After MPI_Bcast\n");
        fflush(stream);
    
        fclose(stream);
    }
    

    很明显,这可能与您以前遇到的问题相同。同样的建议是:创建一个。并编辑原始问题以包含它。@zulan,谢谢!我想删除第一个问题,因为它们是同一个问题。内存泄漏:以下内容从未发布:
    reqSend\u test
    reqRecv\u test
    status\u test
    Read\u test
    Send\u test
    我不知道这两行的意义是什么:
    int check=1;MPI_Bcast(检查和校验,1,MPI_INT,0,MPI_COMM_WORLD)
    
        int main(int argc, char* argv[])
        {
    
            int ISV_LIC = 17143112; // For Platform Computing mpi initialization
            MPI_Initialized(&ISV_LIC);
    
            int threadingUsed = 0;
            MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &threadingUsed);
    
            int mpi_Rank=0;
            MPI_Comm_rank(MPI_COMM_WORLD, &mpi_Rank);
    
            if (mpi_Rank == 0)
            {
                std::cerr << "solverok" << std::endl;
                std::cerr.flush();
                AnsDebug(ACHAR("3dtds-main"), 1, ACHAR("After solverok\n"));
            }
    
            AString simulationDir_master;
            AString simulationDir;
            AString tempDir;
    
            simulationDir_master=argv[argc-3]; 
            ANSOFT_CHDIR(simulationDir_master.ANS_ANSI().Str()); //simulationDir_master;
    
            mpi_BcastEnvVariables();
    
            bool no_temp_dir_override;
            no_temp_dir_override = false;
            for (int ii=0; ii<argc; ii++)
            {
                if (strcmp(argv[ii], "no_temp_dir_override") == 0)
                {
                    no_temp_dir_override = true;
                    break;
                }
            }
    
            AString Dbname;
            Dbname=argv[5];
    
            AString versionedProductName;
            versionedProductName = argv[argc-4];  
    
            AString InstallDir;
            char InstallDir_c[MAX_PATH];
            ANSOFT_GETCWD(InstallDir_c, MAX_PATH);
            InstallDir=AString(InstallDir_c);
    
            if (mpi_Rank != 0)
            {           
                if (!no_temp_dir_override)
                {
                    tempDir=argv[argc-1];
                }else{
                    RegistryAccessNgMaxwell reg;
                    tempDir = reg.GetTempDirectory_Reg(versionedProductName,InstallDir);
                    tempDir =  tempDir.Left(tempDir.size()-1);
                }
    
        #ifndef NDEBUG
                    tempDir = "E:/temp";
        #endif  
                CreateSimulationDir(mpi_Rank, tempDir, Dbname, ACHAR("maxwell"), simulationDir);
                ANSOFT_CHDIR(simulationDir.ANS_ANSI().Str());
            }
    
            char fname [80];
            sprintf(fname, "RecordMPI%d",mpi_Rank);
            FILE * stream = fopen(fname,"w");
    
            fprintf(stream, "Before testMPI\n");
            fflush(stream);
    
            TestMPI();
    
            fprintf(stream, "After testMPI\n");
            fflush(stream);
    
            fclose(stream);
    
            MPI_Finalize();
        }
    
        After MPI_Waitall send
    After Wait Send 0
    After Wait Send 1
    After Wait Send 2
    After Wait Send 3
    After Wait Send 4
    After Wait Send 5
    After Wait Send 6
    After Wait Send 7
    After Wait Send 8
    After Wait Send 9
    After Wait Send 10
    After Wait Send 11
    After Wait Send 12
    After Wait Send 13
    After Wait Send 14
    After Wait Send 15
    After Wait Send 16
    After Wait Send 17
    After Wait Send 18
    After Wait Send 19
    After Wait Send 20
    After Wait Send 21
    After Wait Send 22
    After Wait Send 23
    After Wait Send 24
    After Wait Send 25
    After Wait Send 26
    After Wait Send 27
    After Wait Send 28
    After Wait Send 29
    After Wait Send 30
    After Wait Send 31
    After Wait Send 32
    After Wait Send 33
    After Wait Send 34
    After Wait Send 35
    After Wait Send 36
    After Wait Send 37
    After Wait Send 38
    After Wait Send 39
    After Wait Send 40
    After Wait Send 41
    After Wait Send 42
    After Wait Send 43
    After Wait Send 44
    After Wait Send 45
    After Wait Send 46
    After Wait Send 47
    After Wait Send 48
    After Wait Send 49
    After Wait Send 50
    After Wait Send 51
    After Wait Send 52
    After Wait Send 53
    After Wait Send 54
    After Wait Send 55
    After Wait Send 56
    After Wait Send 57
    After Wait Send 58
    After Wait Send 59
    After Wait Send 60
    After Wait Send 61
    After Wait Send 62
    After Wait Send 63
    After Wait Send 64
    After Wait Send 65
    After Wait Send 66
    After Wait Send 67
    After Wait Send 68
    After Wait Send 69
    After Wait Send 70
    After Wait Send 71
    After Wait Send 72
    After Wait Send 73
    After Wait Send 74
    After Wait Send 75
    After Wait Send 76
    After Wait Send 77
    After Wait Send 78
    After Wait Send 80
    After Wait Send 81
    After Wait Send 82
    After Wait Send 83
    After Wait Send 84
    After Wait Send 85
    After Wait Send 86
    After Wait Send 87
    After Wait Send 88
    After Wait Send 89
    After Wait Send 90
    After Wait Send 91
    After Wait Send 92
    After Wait Send 93
    After Wait Send 94
    After Wait Send 95
    After Wait Send 96
    After Wait Send 97
    After Wait Send 98
    After Wait Send 99
    After Wait Send 100
    After Wait Send 101
    After Wait Send 102
    After Wait Send 103
    After Wait Send 104
    After Wait Recv 0
    After Wait Recv 1
    After Wait Recv 2
    After Wait Recv 3
    After Wait Recv 4
    After Wait Recv 5
    After Wait Recv 6
    After Wait Recv 7
    After Wait Recv 8
    After Wait Recv 9
    After Wait Recv 10
    After Wait Recv 11
    After Wait Recv 12
    After Wait Recv 13
    After Wait Recv 14
    After Wait Recv 15
    After Wait Recv 16
    After Wait Recv 17
    After Wait Recv 18
    After Wait Recv 19
    After Wait Recv 20
    After Wait Recv 21
    After Wait Recv 22
    After Wait Recv 23
    After Wait Recv 24
    After Wait Recv 25
    After Wait Recv 26
    After Wait Recv 27
    After Wait Recv 28
    After Wait Recv 29
    After Wait Recv 30
    After Wait Recv 31