Metal 金属内核着色器--淡入淡出实现_Metal_Pixel Shader

Metal 金属内核着色器--淡入淡出实现

Metal 金属内核着色器--淡入淡出实现,metal,pixel-shader,Metal,Pixel Shader,我还没有写很多金属内核着色器；这是两个RGBX-32图像之间的一个羽翼未丰的“淡入”着色器，在inBuffer1（0.0）到inBuffer2（1.0）之间使用0.0到1.0的tween值这里有我遗漏的东西吗？我突然想到这可能是非常低效的。我的第一个想法是尝试使用向量数据类型（例如，char4）进行减法和乘法，这可能会更好，但其结果肯定是未定义的（因为某些组件将是负数）另外，使用MTLTexture与使用MTLBuffer对象相比，是否有一些优势 kernel void fade_Kern

我还没有写很多金属内核着色器；这是两个RGBX-32图像之间的一个羽翼未丰的“淡入”着色器，在inBuffer1（0.0）到inBuffer2（1.0）之间使用0.0到1.0的tween值

这里有我遗漏的东西吗？我突然想到这可能是非常低效的。

我的第一个想法是尝试使用向量数据类型（例如，

char4

）进行减法和乘法，这可能会更好，但其结果肯定是未定义的（因为某些组件将是负数）

另外，使用

MTLTexture

与使用

MTLBuffer

对象相比，是否有一些优势

kernel void fade_Kernel(device const uchar4  *inBuffer1  [[ buffer(0) ]],
                        device const uchar4  *inBuffer2  [[ buffer(1) ]],
                        device const float   *tween      [[ buffer(2) ]],
                        device uchar4        *outBuffer  [[ buffer(3) ]],
                        uint gid [[ thread_position_in_grid ]])
{
    const float t = tween[0];
    uchar4 pixel1 = inBuffer1[gid];
    uchar4 pixel2 = inBuffer2[gid];

    // these values will be negative
    short r=(pixel2.r-pixel1.r)*t;  
    short g=(pixel2.g-pixel1.g)*t;
    short b=(pixel2.b-pixel1.b)*t;

    outBuffer[gid]=uchar4(pixel1.r+r,pixel1.g+g,pixel1.b+b,0xff);
}

首先，您可能应该将

tween

参数声明为：

constant float &tween [[ buffer(2) ]],

使用

常量

地址空间更适合这样的值，该值对于函数的所有调用都是相同的（并且不按网格位置或类似内容索引）。此外，将其作为引用而不是指针会告诉编译器，您将不会索引“数组”中可能存在指针的其他元素

最后，还有一个

mix（）

函数，它完全执行您在这里所做的计算。因此，您可以将函数体替换为：

uchar4 pixel1 = inBuffer1[gid];
uchar4 pixel2 = inBuffer2[gid];

outBuffer[gid] = uchar4(uchar3(mix(float3(pixel1.rgb), float3(pixel2.rgb), tween)), 0xff);

至于使用纹理是否更好，这在某种程度上取决于运行此内核后对结果的处理。如果你打算用它来做一些类似纹理的事情，那么最好始终使用纹理。事实上，与计算内核相比，将绘图操作与混合一起使用可能更好。毕竟，这样的混合是GPU必须一直做的事情，所以路径可能很快。您必须测试每种方法的性能。

首先，您可能应该将

tween

参数声明为：

constant float &tween [[ buffer(2) ]],

使用

常量

最后，还有一个

mix（）

函数，它完全执行您在这里所做的计算。因此，您可以将函数体替换为：

uchar4 pixel1 = inBuffer1[gid];
uchar4 pixel2 = inBuffer2[gid];

outBuffer[gid] = uchar4(uchar3(mix(float3(pixel1.rgb), float3(pixel2.rgb), tween)), 0xff);

如果您处理图像，使用MTLTexture比使用MTLBuffer更有效。使用“half”比使用“uchar”更好。今年，我直接从WWDC的一位苹果工程师那里学到了这一点

kernel void alpha(texture2d<half, access::read>  inTexture2  [[texture(0)]],
    texture2d<half, access::read>  inTexture1  [[texture(1)]],
    texture2d<half, access::write> outTexture [[texture(2)]],
    const device float& tween [[ buffer(3) ]],
    uint2 gid [[thread_position_in_grid]]) 
{
    // Check if the pixel is within the bounds of the output texture
    if((gid.x >= outTexture.get_width()) || (gid.y >= outTexture.get_height())) {
        // Return early if the pixel is out of bounds
        return;
    }
    half4 color1  = inTexture1.read(gid);
    half4 color2  = inTexture2.read(gid);
    outTexture.write(half4(mix(color1.rgb, color2.rgb, half(tween)), color1.a), gid);
}

kernel void alpha（texture2d inTexture2[[texture（0）]，
纹理2D纹理1[[纹理（1）]，
纹理2D outTexture[[纹理（2）]，
常量设备浮动和吐温[[缓冲区（3）]，
uint2 gid[[螺纹位置在网格中]]
{
//检查像素是否在输出纹理的边界内
如果（（gid.x>=outTexture.get_width（））| |（gid.y>=outTexture.get_height（））{
//如果像素超出边界，则提前返回
返回；
}
half4 color1=inTexture1.read（gid）；
half4 color2=inTexture2.read（gid）；
outTexture.write（half4（mix（color1.rgb，color2.rgb，half（tween）），color1.a），gid）；
}

如果处理图像，使用MTLTexture比使用MTLBuffer更有效。使用“half”比使用“uchar”更好。今年，我直接从WWDC的一位苹果工程师那里学到了这一点

kernel void alpha(texture2d<half, access::read>  inTexture2  [[texture(0)]],
    texture2d<half, access::read>  inTexture1  [[texture(1)]],
    texture2d<half, access::write> outTexture [[texture(2)]],
    const device float& tween [[ buffer(3) ]],
    uint2 gid [[thread_position_in_grid]]) 
{
    // Check if the pixel is within the bounds of the output texture
    if((gid.x >= outTexture.get_width()) || (gid.y >= outTexture.get_height())) {
        // Return early if the pixel is out of bounds
        return;
    }
    half4 color1  = inTexture1.read(gid);
    half4 color2  = inTexture2.read(gid);
    outTexture.write(half4(mix(color1.rgb, color2.rgb, half(tween)), color1.a), gid);
}

kernel void alpha（texture2d inTexture2[[texture（0）]，
纹理2D纹理1[[纹理（1）]，
纹理2D outTexture[[纹理（2）]，
常量设备浮动和吐温[[缓冲区（3）]，
uint2 gid[[螺纹位置在网格中]]
{
//检查像素是否在输出纹理的边界内
如果（（gid.x>=outTexture.get_width（））| |（gid.y>=outTexture.get_height（））{
//如果像素超出边界，则提前返回
返回；
}
half4 color1=inTexture1.read（gid）；
half4 color2=inTexture2.read（gid）；
outTexture.write（half4（mix（color1.rgb，color2.rgb，half（tween）），color1.a），gid）；
}

谢谢你，肯。你又一次帮了大忙。奇怪的是，在Metal2之前的实现中，“混合”似乎并不是一部分。查看Metal Shader语言文档，我可以执行“饱和”，但不能执行“混合”->调用“混合”时没有匹配函数。我的错误

mix（）

仅适用于浮点类型。我已经编辑了我的答案来来回转换。转换隐含在原始代码中。您可能也希望将对

mix（）

的调用包装在对

round（）

的调用中，尽管您的原始代码与我的新代码一样被截断。谢谢Ken。如果要考虑alpha，我想exputffer[gid]=uchar4（mix（float4（像素1）、float4（像素2）、tween））也可以很好地工作。谢谢你，肯。你又一次帮了大忙。奇怪的是，在Metal2之前的实现中，“混合”似乎并不是一部分。查看Metal Shader语言文档，我可以执行“饱和”，但不能执行“混合”->调用“混合”时没有匹配函数。我的错误

mix（）

仅适用于浮点类型。我已经编辑了我的答案来来回转换。转换隐含在原始代码中。您可能也希望将对

mix（）

的调用包装在对

round（）

的调用中，尽管您的原始代码与我的新代码一样被截断。谢谢Ken。如果