Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/android/189.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Android 为浮点计算着色器原子操作_Android_Opengl Es_Synchronization_Compute Shader_Glsles - Fatal编程技术网

Android 为浮点计算着色器原子操作

Android 为浮点计算着色器原子操作,android,opengl-es,synchronization,compute-shader,glsles,Android,Opengl Es,Synchronization,Compute Shader,Glsles,我正在使用计算机着色器来获得一个和值(类型为float),如下所示: #version 320 es layout(local_size_x = 640,local_size_y=480,local_size_z=1) layout(binding = 0) buffer OutputData{ float sum[]; }output; uniform sampler2D texture_1; void main() { vec2 texcoord(float(gl_LocalInvo

我正在使用计算机着色器来获得一个和值(类型为float),如下所示:

#version 320 es
layout(local_size_x = 640,local_size_y=480,local_size_z=1)
layout(binding = 0) buffer OutputData{
float sum[];
}output;
uniform sampler2D texture_1;
void main()
{
    vec2 texcoord(float(gl_LocalInvocationIndex.x)/640.0f,float(gl_LocalInvocationIndex.y)/480.0f);
    float val = textureLod(texture_1,texcoord,0.0).r;
//where need synchronize
    sum[0] = sum[0]+val;
//Here i want to get the sum of all val in texture_1 first channal
}
array_size = N
data = input_array

while array_size > 1:
   spawn pass with M = array_size/2 threads.
   thread M: out[M] = data[2*M] + data[2*M+1]
   array_size = M
   data = out
我知道有些原子操作像atomicAdd(),但不支持float paramater和barrier(),这似乎并不能解决我的问题。
也许我可以将float编码为int,或者有什么简单的方法来解决我的问题吗?

原子通常在性能方面非常差,特别是在大量线程并行访问的情况下,所以我不推荐它们用于此用例

为了保持这里的并行性,您确实需要某种多过程缩减策略。伪代码,如下所示:

#version 320 es
layout(local_size_x = 640,local_size_y=480,local_size_z=1)
layout(binding = 0) buffer OutputData{
float sum[];
}output;
uniform sampler2D texture_1;
void main()
{
    vec2 texcoord(float(gl_LocalInvocationIndex.x)/640.0f,float(gl_LocalInvocationIndex.y)/480.0f);
    float val = textureLod(texture_1,texcoord,0.0).r;
//where need synchronize
    sum[0] = sum[0]+val;
//Here i want to get the sum of all val in texture_1 first channal
}
array_size = N
data = input_array

while array_size > 1:
   spawn pass with M = array_size/2 threads.
   thread M: out[M] = data[2*M] + data[2*M+1]
   array_size = M
   data = out
这是一个简单的2:1缩减,所以提供了O(log2(N))性能,但您可以在每次传递时进行更多缩减,以减少中间存储的内存带宽。对于使用纹理作为输入的GPU来说,4:1是相当不错的(您可以使用textureGather,甚至可以使用简单的线性过滤器在一次纹理操作中加载多个样本)