使用超立方体结构在C中实现简单的并行合并排序_C_Multithreading_Parallel Processing_Hypercube

使用超立方体结构在C中实现简单的并行合并排序

c multithreading parallel-processing

使用超立方体结构在C中实现简单的并行合并排序,c,multithreading,parallel-processing,hypercube,C,Multithreading,Parallel Processing,Hypercube,我试图写一个简单的并行合并排序算法来帮助我理解超立方体通信的概念。我有一个C程序，它生成一个随机整数数组，并分配线程对数据进行排序。每个线程对其数据子列表进行串行排序，然后与其他线程执行比较交换操作以合并子列表。下面是线程函数 void *th_fun(void* args) { int i, j; int my_id = *((int*)args); int chunk_size = N_DATA/N_THREADS; i

我试图写一个简单的并行合并排序算法来帮助我理解超立方体通信的概念。我有一个C程序，它生成一个随机整数数组，并分配线程对数据进行排序。每个线程对其数据子列表进行串行排序，然后与其他线程执行比较交换操作以合并子列表。下面是线程函数

void *th_fun(void* args)
{
    int i, j;
    int my_id             = *((int*)args);

    int chunk_size        = N_DATA/N_THREADS;
    int start_index       = my_id*chunk_size;

    //set up the memory this thread will use
    th_mem[my_id].my_data = &data[start_index];
    th_mem[my_id].partner_data = malloc(sizeof(int)*chunk_size);
    for(i = 0; i < chunk_size; i++)
        th_mem[my_id].partner_data [i] = 0;

    //sort my data serially, should use a mergesort
    qsort((void*)th_mem[my_id].my_data, chunk_size, sizeof(int), compare);

    //wait for all threads to finish sorting
    barrier_simple();

    //compare-exchange accross hypercube
    int partner_id, op;
    int logth = log(N_THREADS)/log(2);
    for(i = 0; i < logth; i++)
    {
        partner_id = my_id ^ (int)pow(2,logth-i-1);

        //send my data to partner
        for(j = 0; j < chunk_size; j++)
        {
            th_mem[partner_id].partner_data[j] = th_mem[my_id].my_data[j];
        }

        //wait for all data to be sent
        barrier_simple();

        //get the lowest or highest elements between my_data, partner_data
        int x = my_id & (int)pow(2,logth-i-1);
        op = (x > 0 ? 1:-1);
        int* output = merge(th_mem[my_id].my_data, th_mem[my_id].partner_data, 
                            chunk_size, op);

        //save results
        for(j = 0; j < chunk_size; j++)
        {
            th_mem[my_id].my_data[j] = output[j];
        }
        free(output);

        //wait for all data to be merged
        barrier_simple();
    }

    free(th_mem[my_id].partner_data);
    pthread_exit(args);
}

正如您所见，并行排序并不完全正确（除了性能方面的考虑，ofc）。我已经做了很多gdb运行，无法确定为什么超立方体交换不起作用。任何帮助都将不胜感激。我的推荐信是

./parallel_merge 12 4
Unsorted:
0:924 674 704 
1:495 554 645 
2:979 285 119 
3:878 80 620 

Serial sort: 
0:80 119 285 
1:495 554 620 
2:645 674 704 
3:878 924 979 

Parallel sort: 
0:80 119 285 
1:495 554 674 
2:620 645 704 
3:878 924 979