如何使用CUDA C快速重新分组/透视数据？_Cuda_Pivot_Grouping_Simulation_Gpgpu

如何使用CUDA C快速重新分组/透视数据？

cuda

如何使用CUDA C快速重新分组/透视数据？,cuda,pivot,grouping,simulation,gpgpu,Cuda,Pivot,Grouping,Simulation,Gpgpu,在SIR感染模拟中，一个人可能容易感染（S）、感染（I）或从（R）疾病中康复。从时间t到t+1，人群P中人群i的感染状态可能发展如下（左表）：（这里，第2人和第9人感染，而第6人康复。注意，一个人的状态只能从S向一个方向发展→ 我→ (R) 现在，我还想根据人们的状态对他们进行分组，就像在右边的表中一样，就像Excel数据透视表一样。在模拟的每个时间段之后，我需要更新分组，理想情况下，每个“分组数组”都会被排序（按索引升序列出人员）因此，如何在每个时间段以最快的方式重新计算/更新这3个分组

在SIR感染模拟中，一个人可能容易感染（S）、感染（I）或从（R）疾病中康复。从时间

到

t+1

，人群

中人群

的感染状态可能发展如下（左表）：

（这里，第2人和第9人感染，而第6人康复。注意，一个人的状态只能从S向一个方向发展→ 我→ (R)

现在，我还想根据人们的状态对他们进行分组，就像在右边的表中一样，就像Excel数据透视表一样。在模拟的每个时间段之后，我需要更新分组，理想情况下，每个“分组数组”都会被排序（按索引升序列出人员）

因此，如何在每个时间段以最快的方式重新计算/更新这3个分组数组？（我考虑过使用原子操作，但由于速度慢，建议尽量避免使用原子操作。）

模拟将在CUDA C中实现，每个线程映射到人口

中的每个人

非常感谢

__global__
static void find_groups(int *locs, int *sorted, int num)
{
    int bid = blockIdx.y * gridDim.x + blockIdx.x;
    int tid = bid * blockDim.x + threadIdx.x;

    if (tid < num) {
        int curr = sorted[tid];
        if (tid == 0 || curr != sorted[tid - 1]) locs[curr] = tid;
    }

}

int main()
{
    int h_P0[N] = {0, 0, 1, 2, 1, 1, 0, 2, 0, 0};
    int h_P1[N] = {0, 1, 1, 2, 1, 2, 0, 2, 1, 0};

    thrust::host_vector<int> th_P0(h_P0, h_P0 + N);
    thrust::host_vector<int> th_P1(h_P1, h_P1 + N);

    thrust::device_vector<int> td_P0 = th_P0;
    thrust::device_vector<int> td_P1 = th_P1;

    thrust::device_vector<int> td_S0(N);
    thrust::device_vector<int> td_S1(N);

    thrust::sequence(td_S0.begin(), td_S0.end());
    thrust::sequence(td_S1.begin(), td_S1.end());

    thrust::stable_sort_by_key(td_P0.begin(), td_P0.end(), td_S0.begin());
    thrust::stable_sort_by_key(td_P1.begin(), td_P1.end(), td_S1.begin());

    thrust::device_vector<int> td_l0(3, -1); // Changed here
    thrust::device_vector<int> td_l1(3, -1); // And here

    int threads =  256;
    int blocks_x = (N + 256) / 256;
    int blocks_y = (blocks_x + 65535) / 65535;
    dim3 blocks(blocks_x, blocks_y);

    int *d_l0 = thrust::raw_pointer_cast(td_l0.data());
    int *d_l1 = thrust::raw_pointer_cast(td_l1.data());
    int *d_P0 = thrust::raw_pointer_cast(td_P0.data());
    int *d_P1 = thrust::raw_pointer_cast(td_P1.data());

    find_groups<<<blocks, threads>>>(d_l0, d_P0, N);
    find_groups<<<blocks, threads>>>(d_l1, d_P1, N);

    return 0;
}

如果您需要访问完整代码（包括打印代码），请访问

我不确定这是否足够。但是如果我错过了什么，一定要告诉我

编辑

更改代码以处理缺少类的情况。用-1初始化相关向量。因此，当您遇到一个起始点-1时，这意味着该类不会出现在该迭代中。

一个带有每个组起始点的单个排序向量是一个可接受的解决方案吗？您能否举例说明您对上述内容的含义？感谢您的示例，t将是

1 2 7 9 10 3 5 6 4 8

，后面是另一个向量，以

1 6 9

作为S，I，R的起始位置。t+1将

1 7 10 2 3 5 9 4 6 8

，后面是另一个向量，以

1 4 8

作为起始位置。如果需要，我们可以使用从t到t+1的映射函数，如

1 4 2 7 3 5 6 9 8 10

Yes，@Pavan。这将是一个在信息上等价的、因此可以接受的解决方案。谢谢，非常感谢。但是如果

P0

和

P1

首先驻留在设备上，该怎么办？如何让推力在GPU上的矢量上运行？@MiloChen如果GPU上已经有矢量，请使用

推力：：设备\u矢量（d\u ptr，d\u ptr+N）

谢谢-还有，如何将设备向量传递给另一个函数或赋予其全局作用域？您可以使用原始指针强制转换，将设备向量取出，并在处理完成后像常规设备向量一样使用。当然可以，但如果我使用

raw指针强制转换

将原始设备向量取出并继续对原始设备向量进行更改，我还能继续使用旧的推力装置矢量容器吗，还是两者会变得“不同步”？

Sorted
t    t+1
0    0
1    6
6    9
8    1
9    2
2    4
4    8
5    3
3    5
7    7
Ranges
Groups   t   t + 1
S    [0-4]   [0-2]
I    [5-7]   [3-6]
R    [8-9]   [7-9]