Sorting 推力：使用zip迭代器性能按键排序_Sorting_Cuda_Thrust

Sorting 推力：使用zip迭代器性能按键排序

sorting cuda

Sorting 推力：使用zip迭代器性能按键排序,sorting,cuda,thrust,Sorting,Cuda,Thrust,问题我正在使用sort\u by\u key，使用zip\u迭代器传递值。这个sort\u by_key被调用了很多次，经过一定的迭代后，它会变慢10倍！性能下降的原因是什么症状我使用sort\u by\u key对3个向量进行排序，其中一个作为键向量： struct Segment { int v[2]; }; thrust::device_vector<int> keyVec; thrust::device_vector<int> valVec; thru

问题

我正在使用

sort\u by\u key

，使用

zip\u迭代器传递值。这个sort\u by_key
被调用了很多次，经过一定的迭代后，它会变慢10倍！性能下降的原因是什么
症状
我使用sort\u by\u key
对3个向量进行排序，其中一个作为键向量：
struct Segment
{
  int v[2];
};

thrust::device_vector<int> keyVec;
thrust::device_vector<int> valVec;
thrust::device_vector<Segment> segVec;

// ... code which fills these vectors ...

thrust::sort_by_key( keyVec.begin(), keyVec.end(),
                     make_zip_iterator( make_tuple( valVec.begin(), segVec.begin() ) ) );

此手写排序需要0.03秒，并且此性能在所有迭代中都是一致的，这与sort_by_key和zip_迭代器的性能下降不同。
要在每个循环中准确计时，您需要在每个循环结束时使用cudaThreadSynchronize。前两个循环的计时可能不是您想要的实际计时。Pavan：在我记录时间之前，我使用的是cudaThreadSynchronize，并且使用的是Windows高分辨率计时器API。这仍然是推力1.6的问题吗？
thrust::device_vector<int> indexVec( keyVec.size() );
thrust::sequence( indexVec.begin(), indexVec.end() );

// Sort the keys and indexes
thrust::sort_by_key( keyVec.begin(), keyVec.end(), indexVec.begin() );

thrust::device_vector<int> valVec2( keyVec.size() );
thrust::device_vector<Segment> segVec2( keyVec.size() );

// Use index array and move vectors to destination
moveKernel<<< x, y >>>(
  toRawPtr( indexVec ),
  indexVec.size(),
  toRawPtr( valVec ),
  toRawPtr( segVec ),
  toRawPtr( valVec2 ),
  toRawPtr( segVec2 ) );

// Swap back into original vectors
valVec.swap( valVec2 );
segVec.swap( segVec2 );