Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/fsharp/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
OpenCL二进制计数_Opencl_Pyopencl - Fatal编程技术网

OpenCL二进制计数

OpenCL二进制计数,opencl,pyopencl,Opencl,Pyopencl,我试图在OpenCL中实现一个bincount操作,它分配一个输出缓冲区,并使用x中的索引在同一个索引中累积一些权重(假设num\u bins==max(x))。这相当于以下python代码: out = np.zeros_like(num_bins) for i in range(len(x)): out[x[i]] += weight[i] return out 我所拥有的是: import pyopencl as cl import numpy as np ctx = cl.c

我试图在OpenCL中实现一个bincount操作,它分配一个输出缓冲区,并使用x中的索引在同一个索引中累积一些权重(假设
num\u bins==max(x)
)。这相当于以下python代码:

out = np.zeros_like(num_bins)
for i in range(len(x)):
    out[x[i]] += weight[i]
return out
我所拥有的是:

import pyopencl as cl
import numpy as np

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

prg = cl.Program(ctx, """
__kernel void bincount(__global int *res_g, __global const int* x_g, __global const int* weight_g)
{
  int gid = get_global_id(0);
  res_g[x_g[gid]] += weight_g[gid];
}
""").build()

# test
x = np.arange(5, dtype=np.int32).repeat(2) # [0, 0, 1, 1, 2, 2, 3, 3, 4, 4]
x_g = cl.Buffer(ctx, cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR, hostbuf=x)

weight = np.arange(10, dtype=np.int32) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
weight_g = cl.Buffer(ctx, cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR, hostbuf=weight)

res_g = cl.Buffer(ctx, cl.mem_flags.READ_WRITE, 4 * 5)

prg.bincount(queue, [10], None, res_g, x_g, weight_g)

# transfer back to cpu
res_np = np.empty(5).astype(np.int32)
cl.enqueue_copy(queue, res_np, res_g)
res\u np
格式输出:

array([1, 3, 5, 7, 9], dtype=int32)
预期产出:

array([1, 5, 9, 13, 17], dtype=int32)
如何累积索引多次的元素

编辑

上面是一个人为的例子,在我的实际应用程序中,
x
将是来自滑动窗口算法的索引:

x = np.array([ 0,  1,  2,  4,  5,  6,  8,  9, 10,  1,  2,  3,  5,  6,  7,  9, 10,
              11,  4,  5,  6,  8,  9, 10, 12, 13, 14,  5,  6,  7,  9, 10, 11, 13,
              14, 15,  8,  9, 10, 12, 13, 14, 16, 17, 18,  9, 10, 11, 13, 14, 15,
              17, 18, 19, 20, 21, 22, 24, 25, 26, 28, 29, 30, 21, 22, 23, 25, 26,
              27, 29, 30, 31, 24, 25, 26, 28, 29, 30, 32, 33, 34, 25, 26, 27, 29,
              30, 31, 33, 34, 35, 28, 29, 30, 32, 33, 34, 36, 37, 38, 29, 30, 31,
              33, 34, 35, 37, 38, 39], dtype=np.int32)

weight = np.array([1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,
                   0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
                   0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
                   1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,
                   0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0], dtype=np.int32)
当将
x
重塑为
(2,3,2,3,3)
时,有一种模式变得更加明显。但我很难弄清楚@doqtor给出的方法如何在这里使用,特别是如果它很容易推广的话

预期产出为:

array([1, 1, 0, 0, 2, 2, 0, 0, 3, 3, 0, 0, 2, 2, 0, 0, 1, 1, 0, 0, 1, 1,
       0, 0, 2, 2, 0, 0, 3, 3, 0, 0, 2, 2, 0, 0, 1, 1, 0, 0], dtype=int32)

问题是权重被累积到的OpenCL缓冲区没有初始化(归零)。确定:

res_np = np.zeros(5).astype(np.int32)
res_g = cl.Buffer(ctx, cl.mem_flags.WRITE_ONLY | cl.mem_flags.COPY_HOST_PTR, hostbuf=res_np)

prg.bincount(queue, [10], None, res_g, x_g, weight_g)

# transfer back to cpu
cl.enqueue_copy(queue, res_np, res_g)
返回正确的结果:
[1 5 9 13 17]

======更新==========

正如@Kevin所注意到的,这里也有比赛条件。如果存在任何模式,则可以通过这种方式解决,而无需使用同步,例如,按1个工作项处理每2个元素:

__kernel void bincount(__global int *res_g, __global const int* x_g, __global const int* weight_g)
{
  int gid = get_global_id(0);
  for(int x = gid*2; x < gid*2+2; ++x)
      res_g[x_g[x]] += weight_g[x];
}

谢谢,但我还是得到了同样的错误结果。我应该运行OpenCL3.0。这个问题很小,我当然怀疑OpenCL版本在这里的重要性。确保最后5行非空行被替换为答案中的代码。我确保输出缓冲区已初始化为零且仍然相同。我不确定,但我认为整个问题与工作项同步有关。我为
x
中的每个元素启动了10个线程,但是当两个线程写入同一索引并导致“最慢”的线程被覆盖时,可能会发生冲突?是的,你是对的,这里存在争用条件。对于这个特定的示例,您可以使每个2个元素由一个工作项运行。
prg.bincount(queue, [5], None, res_g, x_g, weight_g)