OpenCL内核未矢量化
我正在尝试构建一个内核来进行并行字符串搜索。为此,我倾向于使用有限状态机。fsm的转换表处于内核参数状态。守则:OpenCL内核未矢量化,opencl,vectorization,Opencl,Vectorization,我正在尝试构建一个内核来进行并行字符串搜索。为此,我倾向于使用有限状态机。fsm的转换表处于内核参数状态。守则: __kernel void Find ( __constant char *text, const int offset, const int tlenght, __constant char *characters, const int clength, const int max
__kernel void Find ( __constant char *text,
const int offset,
const int tlenght,
__constant char *characters,
const int clength,
const int maxlength,
__constant int *states,
const int statesdim){
private char c;
private int state;
private const int id = get_global_id(0);
if (id<(tlenght-maxlength)) {
private int cIndex,sd,s,k;
for (int i=0; i<maxlength; i++) {
c = text[i+offset];
cIndex = -1;
for (int j=0; j<clength; j++) {
if (characters[j]==c) {
cIndex = j;
}
}
if (cIndex==-1) {
state = 0;
break;
} else {
s = states[state+cIndex*statesdim];
}
if (state<=0) break;
}
}
}
结果是:
Using default instruction set architecture.
Intel OpenCL CPU device was found!
Device name: Pentium(R) Dual-Core CPU T4400 @ 2.20GHz
Device version: OpenCL 1.1 (Build 31360.31426)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
Build started
Kernel <Find> was not vectorized
Done.
Build succeeded!
使用默认指令集体系结构。
找到英特尔OpenCL CPU设备!
设备名称:奔腾(R)双核CPU T4400@2.20GHz
设备版本:OpenCL 1.1(构建31360.31426)
设备供应商:英特尔(R)公司
设备配置文件:完整配置文件
开始构建
内核没有矢量化
完成。
建造成功!
声明
X = states[state+cIndex*statesdim];
无法矢量化,因为索引不一定用于跨线程访问后续字节
注意,在第一个内核中,目标变量s
没有写回全局内存。因此,编译器可以优化代码并删除s=states[state+cIndex*statesdim]代码>语句。因此,看起来您的语句已经矢量化了,但事实并非如此
Using default instruction set architecture.
Intel OpenCL CPU device was found!
Device name: Pentium(R) Dual-Core CPU T4400 @ 2.20GHz
Device version: OpenCL 1.1 (Build 31360.31426)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
Build started
Kernel <Find> was not vectorized
Done.
Build succeeded!
X = states[state+cIndex*statesdim];