Cuda 如何在推力集操作中动态设置设备_向量的大小？_Cuda_Set_Difference_Thrust

Cuda 如何在推力集操作中动态设置设备_向量的大小？

cuda

Cuda 如何在推力集操作中动态设置设备_向量的大小？,cuda,set,difference,thrust,Cuda,Set,Difference,Thrust,我有两个集合A和B。我的运算结果应该有A中的元素，而B中没有。我使用集合U差来完成它。但是，必须在操作之前设置resultC的大小。否则它的末尾会有额外的零，如下所示： A= 1 2 3 4 5 6 7 8 9 10 B= 1 2 8 11 7 4 C= 3 5 6 9 10 0 0 0 0 0 如何动态设置resultC的大小，使输出为C=3 5 6 9。在实际问题中，我不知道结果设备所需的大小我的代码： #include <thrust/execution_policy.h&

我有两个集合A和B。我的运算结果应该有A中的元素，而B中没有。我使用集合U差来完成它。但是，必须在操作之前设置resultC的大小。否则它的末尾会有额外的零，如下所示：

A=
1 2 3 4 5 6 7 8 9 10 
B=
1 2 8 11 7 4 
C=
3 5 6 9 10 0 0 0 0 0

如何动态设置resultC的大小，使输出为C=3 5 6 9。在实际问题中，我不知道结果设备所需的大小

我的代码：

#include <thrust/execution_policy.h>
#include <thrust/set_operations.h>
#include <thrust/sequence.h>
#include <thrust/execution_policy.h>
#include <thrust/device_vector.h>


void remove_common_elements(thrust::device_vector<int> A, thrust::device_vector<int> B, thrust::device_vector<int>& C)
{

thrust::sort(thrust::device, A.begin(), A.end());
thrust::sort(thrust::device, B.begin(), B.end());

thrust::set_difference(thrust::device, A.begin(), A.end(), B.begin(), B.end(), C.begin());
}

int main(int argc, char * argv[])
{

thrust::device_vector<int> A(10);
thrust::sequence(thrust::device, A.begin(), A.end(),1);  // x components of the 'A' vectors

thrust::device_vector<int> B(6);
B[0]=1;B[1]=2;B[2]=8;B[3]=11;B[4]=7;B[5]=4;

thrust::device_vector<int> C(A.size());

std::cout << "A="<< std::endl;
thrust::copy(A.begin(), A.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;

std::cout << "B="<< std::endl;
thrust::copy(B.begin(), B.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;

remove_common_elements(A, B, C);

std::cout << "C="<< std::endl;
thrust::copy(C.begin(), C.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;

return 0;
}

在一般情况下，即在各种推力算法中，通常没有办法知道输出大小，除非上界是什么。这里通常的方法是传递一个结果向量，其大小是可能输出大小的上限。正如您已经说过的，在许多情况下，输出的实际大小无法事先知道。推力并没有特别的魔力来解决这个问题。操作完成后，您将知道结果的大小，如果额外的零由于某种原因成为问题，则可以将结果复制到新的向量。我想不出它们通常会成为问题的原因，只是它们会占用分配的空间

如果这是非常令人不快的，那么从Jared Hoberock的响应中复制此信息的一种可能性是运行算法两次，第一次对输出数据使用a，第二次使用实际迭代器，指向所需大小的实际向量分配。在第一个过程中，使用discard_迭代器计算实际结果数据的大小，即使它没有存储在任何地方。直接引用Jared的话：

在第一阶段，传递一个discard_迭代器作为输出迭代器。您可以比较作为结果返回的discard_迭代器来计算输出的大小。在第二阶段，调用real算法，并使用第一阶段的结果将其输出到一个大小为的数组中

该技术在set_operations.cu示例[0,1]中进行了演示：

[0]

[1]

在一般情况下，即在各种推力算法中，通常无法知道输出大小，除非上限是多少。这里通常的方法是传递一个结果向量，其大小是可能输出大小的上限。正如您已经说过的，在许多情况下，输出的实际大小无法事先知道。推力并没有特别的魔力来解决这个问题。操作完成后，您将知道结果的大小，如果额外的零由于某种原因成为问题，则可以将结果复制到新的向量。我想不出它们通常会成为问题的原因，只是它们会占用分配的空间

该技术在set_operations.cu示例[0,1]中进行了演示：

[0]

[1] 推力：：set_difference返回一个迭代器到结果范围的末尾

如果只想将C的逻辑大小更改为结果元素的数量，只需删除结果范围后面的范围即可

void remove_common_elements(thrust::device_vector<int> A, 
thrust::device_vector<int> B, thrust::device_vector<int>& C)
{

    thrust::sort(thrust::device, A.begin(), A.end());
    thrust::sort(thrust::device, B.begin(), B.end());

    auto C_end = thrust::set_difference(thrust::device, A.begin(), A.end(), B.begin(), B.end(), C.begin());
    C.erase(C_end, C.end());
}

set_difference返回一个迭代器到结果范围的末尾

如果只想将C的逻辑大小更改为结果元素的数量，只需删除结果范围后面的范围即可

void remove_common_elements(thrust::device_vector<int> A, 
thrust::device_vector<int> B, thrust::device_vector<int>& C)
{

    thrust::sort(thrust::device, A.begin(), A.end());
    thrust::sort(thrust::device, B.begin(), B.end());

    auto C_end = thrust::set_difference(thrust::device, A.begin(), A.end(), B.begin(), B.end(), C.begin());
    C.erase(C_end, C.end());
}

如果需要限制峰值内存使用量，那么另一个答案中描述的技术是唯一的选择。如果需要限制峰值内存使用量，那么另一个答案中描述的技术是唯一的选择。