CUDA并行化_Cuda - Fatal编程技术网

CUDA并行化

cuda

CUDA并行化,cuda,Cuda,我在用CUDA对数字数组进行并行处理时遇到了问题例如，如果我们有一个包含数字（1，2，3，4，5）的数组M 如果我把数组中的数字2去掉，然后把所有的东西移到左边，结果数组将是（1，3，4，5，5）其中M[1]=M[2]，M[2]=M[3]，M[3]=M[4] 我的问题是，我们如何在cuda中同时做到这一点？因为当我们平行于此可能存在一个竞赛条件，其中数字2（M[1]）可能不是第一个被选中的首先，如果M[2]是第一个移位的，则生成的数组将变为 ( 1 , 4 , 4 , 5 , 5).

我在用CUDA对数字数组进行并行处理时遇到了问题

例如，如果我们有一个包含数字（1，2，3，4，5）的数组M

如果我把数组中的数字2去掉，然后把所有的东西移到左边，结果数组将是（1，3，4，5，5）

其中M[1]=M[2]，M[2]=M[3]，M[3]=M[4]

我的问题是，我们如何在cuda中同时做到这一点？因为当我们平行于此可能存在一个竞赛条件，其中数字2（M[1]）可能不是第一个被选中的首先，如果M[2]是第一个移位的，则生成的数组将变为 ( 1 , 4 , 4 , 5 , 5). 有什么办法处理这个问题吗？我是cuda的新手，所以我不知道该怎么办

我目前的代码如下：

\uuuu全局\uuuuu无效gpu移位调幅（int*MCEnergyMat，int*seam，int-width，int-height，int-currow）
{
int i=blockIdx.x*blockDim.x+threadIdx.x；
int j=blockIdx.y*blockDim.y+threadIdx.y；
int index=i+宽度*j；
如果（i
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/remove.h>
#include <iostream>

int main()
{
  int data[5] = {1,2,3,4,5};
  thrust::device_vector<int> d_vec(data, data + 5);

  // new_end points to the end of the sequence after 2 has been thrown out
  thrust::device_vector<int>::iterator new_end = 
    thrust::remove(d_vec.begin(), d_vec.end(), 2);

  // erase everything after the new end
  d_vec.erase(new_end, d_vec.end());

  // prove that it worked
  thrust::host_vector<int> h_vec = d_vec;

  std::cout << "result: ";
  thrust::copy(h_vec.begin(), h_vec.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << std::endl;

  return 0;
}

#包括
#包括
#包括
#包括
int main（）
{
int data[5]={1,2,3,4,5}；
推力：设备矢量d矢量（数据，数据+5）；
//抛出2后，新的\u端点指向序列的末尾
推力：：设备\向量：：迭代器新\端=
推力：：移除（d_向量开始（），d_向量结束（），2）；
//删除新结束后的所有内容
d_vec.erase（new_end，d_vec.end（））；
//证明它有效
推力：主机向量h\u vec=d\u vec；
正如Talonmes在他的评论中所指出的那样，这类事情被称为“流压缩”。下面是如何使用推力：
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/remove.h>
#include <iostream>

int main()
{
  int data[5] = {1,2,3,4,5};
  thrust::device_vector<int> d_vec(data, data + 5);

  // new_end points to the end of the sequence after 2 has been thrown out
  thrust::device_vector<int>::iterator new_end = 
    thrust::remove(d_vec.begin(), d_vec.end(), 2);

  // erase everything after the new end
  d_vec.erase(new_end, d_vec.end());

  // prove that it worked
  thrust::host_vector<int> h_vec = d_vec;

  std::cout << "result: ";
  thrust::copy(h_vec.begin(), h_vec.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << std::endl;

  return 0;
}

#包括
#包括
#包括
#包括
int main（）
{
int data[5]={1,2,3,4,5}；
推力：设备矢量d矢量（数据，数据+5）；
//抛出2后，新的\u端点指向序列的末尾
推力：：设备\向量：：迭代器新\端=
推力：：移除（d_向量开始（），d_向量结束（），2）；
//删除新结束后的所有内容
d_vec.erase（new_end，d_vec.end（））；
//证明它有效
推力：主机向量h\u vec=d\u vec；
std:：cout Stream Compression是GPU上一个已解决的问题。您可以使用许多健壮的现成CUDA实现，包括CUDA toolkit已经提供了几年的实现。为什么不使用其中一个呢？流压缩是GPU上一个已解决的问题。当然，有许多健壮的CUDA实现对于您可以使用的现成CUDA实现，包括已经随CUDA工具包提供了几年的实现。为什么不使用其中一个呢？