在CUDA中获取多个阵列的唯一元素_Cuda_Thrust

在CUDA中获取多个阵列的唯一元素

cuda

在CUDA中获取多个阵列的唯一元素,cuda,thrust,Cuda,Thrust,问题是：数组的数目是有限的，例如，2000个数组，但每个数组中只有256个整数。整数的范围相当大，比如说，1000000 我想为每个数组获取唯一的元素，换句话说，删除重复的元素。我有两种解决方案：使用推力获得每个数组的唯一元素，因此我必须执行2000次推力：：唯一。但每个阵列都非常小，这种方式可能无法获得良好的性能在cuda内核中实现哈希表，使用2000个块，每个块256个线程。并利用共享内存实现对哈希表的访问，每个块都会产生一个元素唯一的数组以上两种方法似乎不专业，CUDA有没有优雅

问题是：数组的数目是有限的，例如，2000个数组，但每个数组中只有256个整数。整数的范围相当大，比如说，1000000

我想为每个数组获取唯一的元素，换句话说，删除重复的元素。我有两种解决方案：

使用推力获得每个数组的唯一元素，因此我必须执行2000次

推力：：唯一

。但每个阵列都非常小，这种方式可能无法获得良好的性能

在cuda内核中实现哈希表，使用2000个块，每个块256个线程。并利用共享内存实现对哈希表的访问，每个块都会产生一个元素唯一的数组

以上两种方法似乎不专业，CUDA有没有优雅的方法来解决问题？

如果你像在这个问题中那样修改数据，你可以使用

asch:：unique

：

为了简化，我们假设每个数组包含

per\u array

元素，并且总共有

array\u num

数组。每个元素都在范围

[0，最大元素]

内

演示

数据

与

每个数组=4

，

数组数=3

和

最大元素=2

可以如下所示：

data = {1,0,1,2},{2,2,0,0},{0,0,0,0}

为了表示每个元素在各自数组中的成员身份，我们使用以下

标志

：

flags = {0,0,0,0},{1 1 1 1},{2,2,2,2}

为了获得分段数据集的每个数组的唯一元素，我们需要执行以下步骤：

转换

数据

，使每个数组的元素

在唯一范围内

[i*2*max\u元素，i*2*max\u元素+max\u元素]

data = data + flags*2*max_element
data = {1,0,1,2},{6,6,4,4},{8,8,8,8}

对转换后的数据进行排序：

data = {0,0,1,2},{4,4,6,6},{8,8,8,8}

使用

数据

作为键和

标志

作为值应用：

data  = {0,1,2}{4,6}{8}
flags = {0,0,0}{1,1}{2}

将

数据

转换回原始值：

data  = data - flags*2*max_element
data  = {0,1,2}{0,2}{0}

max_元素

的最大值受用于表示

数据的整数大小限制。如果它是位为n
的无符号整数：
max_max_element(n,array_num) = 2^n/(2*(array_num-1)+1)

给定数组_num=2000
，您将获得32位和64位无符号整数的以下限制：
max_max_element(32,2000) = 1074010
max_max_element(64,2000) = 4612839228234447

以下代码实现了上述步骤：
每个数组唯一。cu
#include <thrust/device_vector.h>
#include <thrust/extrema.h>
#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/functional.h>
#include <thrust/sort.h>
#include <thrust/unique.h>
#include <thrust/copy.h>

#include <iostream>
#include <cstdint>

#define PRINTER(name) print(#name, (name))
template <template <typename...> class V, typename T, typename ...Args>
void print(const char* name, const V<T,Args...> & v)
{
    std::cout << name << ":\t";
    thrust::copy(v.begin(), v.end(), std::ostream_iterator<T>(std::cout, "\t"));
    std::cout << std::endl;
}

int main()
{ 
    typedef uint32_t Integer;

    const std::size_t per_array = 4;
    const std::size_t array_num = 3;

    const std::size_t total_count = array_num * per_array;

    Integer demo_data[] = {1,0,1,2,2,2,0,0,0,0,0,0};

    thrust::device_vector<Integer> data(demo_data, demo_data+total_count);    

    PRINTER(data);

    // if max_element is known for your problem,
    // you don't need the following operation 
    Integer max_element = *(thrust::max_element(data.begin(), data.end()));
    std::cout << "max_element=" << max_element << std::endl;

    using namespace thrust::placeholders;

    // create the flags

    // could be a smaller integer type as well
    thrust::device_vector<uint32_t> flags(total_count);

    thrust::counting_iterator<uint32_t> flags_cit(0);

    thrust::transform(flags_cit,
                      flags_cit + total_count,
                      flags.begin(),
                      _1 / per_array);
    PRINTER(flags);


    // 1. transform data into unique ranges  
    thrust::transform(data.begin(),
                      data.end(),
                      thrust::counting_iterator<Integer>(0),
                      data.begin(),
                      _1 + (_2/per_array)*2*max_element);
    PRINTER(data);

    // 2. sort the transformed data
    thrust::sort(data.begin(), data.end());
    PRINTER(data);

    // 3. eliminate duplicates per array
    auto new_end = thrust::unique_by_key(data.begin(),
                                         data.end(),
                                         flags.begin());

    uint32_t new_size = new_end.first - data.begin();
    data.resize(new_size);
    flags.resize(new_size);

    PRINTER(data);
    PRINTER(flags);

    // 4. transform data back
    thrust::transform(data.begin(),
                      data.end(),
                      flags.begin(),
                      data.begin(),
                      _1 - _2*2*max_element);

    PRINTER(data);

}    

还有一件事：
在中，对推力：：unique*
进行了改进。如果您希望获得更好的性能，您可能需要尝试此版本。
如果您修改数据的方式与此问题类似，则可以使用asch:：unique
：
为了简化，我们假设每个数组包含per\u array
元素，并且总共有array\u num
数组。每个元素都在范围[0，最大元素]
内
演示数据
与每个数组=4
，数组数=3
和最大元素=2
可以如下所示：
data = {1,0,1,2},{2,2,0,0},{0,0,0,0}

为了表示每个元素在各自数组中的成员身份，我们使用以下标志
：
flags = {0,0,0,0},{1 1 1 1},{2,2,2,2}

为了获得分段数据集的每个数组的唯一元素，我们需要执行以下步骤：
转换数据
，使每个数组的元素i
在唯一范围内[i*2*max\u元素，i*2*max\u元素+max\u元素]

data = data + flags*2*max_element
data = {1,0,1,2},{6,6,4,4},{8,8,8,8}


对转换后的数据进行排序：
data = {0,0,1,2},{4,4,6,6},{8,8,8,8}


使用数据
作为键和标志
作为值应用：
data  = {0,1,2}{4,6}{8}
flags = {0,0,0}{1,1}{2}


将数据
转换回原始值：
data  = data - flags*2*max_element
data  = {0,1,2}{0,2}{0}


max_元素
的最大值受用于表示数据的整数大小限制。如果它是位为n
的无符号整数：
max_max_element(n,array_num) = 2^n/(2*(array_num-1)+1)

给定数组_num=2000
，您将获得32位和64位无符号整数的以下限制：
max_max_element(32,2000) = 1074010
max_max_element(64,2000) = 4612839228234447

以下代码实现了上述步骤：
每个数组唯一。cu
#include <thrust/device_vector.h>
#include <thrust/extrema.h>
#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/functional.h>
#include <thrust/sort.h>
#include <thrust/unique.h>
#include <thrust/copy.h>

#include <iostream>
#include <cstdint>

#define PRINTER(name) print(#name, (name))
template <template <typename...> class V, typename T, typename ...Args>
void print(const char* name, const V<T,Args...> & v)
{
    std::cout << name << ":\t";
    thrust::copy(v.begin(), v.end(), std::ostream_iterator<T>(std::cout, "\t"));
    std::cout << std::endl;
}

int main()
{ 
    typedef uint32_t Integer;

    const std::size_t per_array = 4;
    const std::size_t array_num = 3;

    const std::size_t total_count = array_num * per_array;

    Integer demo_data[] = {1,0,1,2,2,2,0,0,0,0,0,0};

    thrust::device_vector<Integer> data(demo_data, demo_data+total_count);    

    PRINTER(data);

    // if max_element is known for your problem,
    // you don't need the following operation 
    Integer max_element = *(thrust::max_element(data.begin(), data.end()));
    std::cout << "max_element=" << max_element << std::endl;

    using namespace thrust::placeholders;

    // create the flags

    // could be a smaller integer type as well
    thrust::device_vector<uint32_t> flags(total_count);

    thrust::counting_iterator<uint32_t> flags_cit(0);

    thrust::transform(flags_cit,
                      flags_cit + total_count,
                      flags.begin(),
                      _1 / per_array);
    PRINTER(flags);


    // 1. transform data into unique ranges  
    thrust::transform(data.begin(),
                      data.end(),
                      thrust::counting_iterator<Integer>(0),
                      data.begin(),
                      _1 + (_2/per_array)*2*max_element);
    PRINTER(data);

    // 2. sort the transformed data
    thrust::sort(data.begin(), data.end());
    PRINTER(data);

    // 3. eliminate duplicates per array
    auto new_end = thrust::unique_by_key(data.begin(),
                                         data.end(),
                                         flags.begin());

    uint32_t new_size = new_end.first - data.begin();
    data.resize(new_size);
    flags.resize(new_size);

    PRINTER(data);
    PRINTER(flags);

    // 4. transform data back
    thrust::transform(data.begin(),
                      data.end(),
                      flags.begin(),
                      data.begin(),
                      _1 - _2*2*max_element);

    PRINTER(data);

}    

还有一件事：
在中，对推力：：unique*
进行了改进。如果您希望获得更好的性能，您可能希望尝试此版本。
我认为这可以帮助您做到这一点。
我认为这可以帮助您做到这一点。
您可以在传递给推力：：for each
的函子中使用推力：：unique
。概述了一般方法。256个整数的范围是否受到某种限制（例如min=0，max=2^16
）？@m.s.不，整数的范围很大。最小值=0最大值=1000000（或更大）。因此，不可能使用整数作为数组的基。可以在传递给推力：：for_each
的函子中使用推力：：unique
。概述了一般方法。256个整数的范围是否受到某种限制（例如min=0，max=2^16
）？@m.s.不，整数的范围很大。最小值=0最大值=1000000（或更大）。因此，不可能使用整数作为数组的基。