Python Beignet OpenCL PyGPU问题
我试图使用OpenCL作为Kubuntu 17.04上Theano的后端,但遇到了一些我无法解决的问题 由于我使用的是Intel Broadwell处理器(i7-5557u,如果有帮助的话),我下载了Beignet源代码(1.3.1)及其所有依赖项的副本,并Python Beignet OpenCL PyGPU问题,python,opencl,theano,beignet,Python,Opencl,Theano,Beignet,我试图使用OpenCL作为Kubuntu 17.04上Theano的后端,但遇到了一些我无法解决的问题 由于我使用的是Intel Broadwell处理器(i7-5557u,如果有帮助的话),我下载了Beignet源代码(1.3.1)及其所有依赖项的副本,并make&&make install。根据数据,这似乎效果不错,因为 /utest\u run命令报告100%成功 clinfo提供了有关该处理器的OpenCL功能的一整套信息(我认为这些信息是正确的)。在安装Beignet之前,它没有显示任
make&&make install
。根据数据,这似乎效果不错,因为
/utest\u run
命令报告100%成功clinfo
提供了有关该处理器的OpenCL功能的一整套信息(我认为这些信息是正确的)。在安装Beignet之前,它没有显示任何支持conda
包管理器添加了Keras
(2.0.6)、Theano
(0.9.0)和pygpu
(0.6.9)。Keras和Theano似乎工作得很好,因为我从fast.ai课程改编的python脚本在使用CPU时做了它应该做的事情(显然非常慢)。另外,一个简单的测试脚本取自ye olde internet,它说CPU路径工作正常(供参考)
为了不使用OpenCL后端,我添加了~/.theanoc
文件,其中包含以下内容:
[全球]
floatX=float32
设备=opencl0:0
现在,当我运行上述pastebin脚本时,会出现以下错误:
错误(theano.gpuarray):无法初始化pygpu,支持已禁用
回溯(最近一次呼叫最后一次):
文件“/home/sahab/anaconda2/lib/python2.7/site packages/theano/gpuarray/_init__.py”,第164行,在
使用(config.device)
文件“/home/sahab/anaconda2/lib/python2.7/site packages/theano/gpuarray/_init__.py”,第151行,正在使用中
初始开发(设备)
文件“/home/sahab/anaconda2/lib/python2.7/site packages/theano/gpuarray/_init__.py”,第60行,在init_dev中
sched=config.gpuarray.sched)
pygpu.gpuarray.init中的第634行文件“pygpu/gpuarray.pyx”
pygpu.gpuarray.pygpuinit中第584行的文件“pygpu/gpuarray.pyx”
pygpu.gpuarray.GpuContext.中的文件“pygpu/gpuarray.pyx”,第1057行__
GpuArrayException:clGetPlatformIDs(0、NULL和nump):未知错误
更简单的测试
DEVICE=“opencl0:0”python-c“导入pygpu;pygpu.test()
抛出与上面相同的错误
我认为问题源于Beignet而不是pygpu,但我不知道如何找到问题的根源,因为clinfo看起来很好。我已经做了很多研究,但这似乎不是人们正在做的事情,因为实际上没有任何文档/博客帖子/你对它的看法。有什么想法吗
(作为记录,鉴于我现有的计算机,我相当肯定这不会给我带来太多的速度提升,但根据一些阅读资料,它至少会使速度提高1.5到2倍,因此我仍然值得尝试跟踪它)
我的clinfo
后代输出:
Number of platforms 2
Platform Name Intel Gen OCL Driver
Platform Vendor Intel
Platform Version OpenCL 1.2 beignet 1.3
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
Platform Extensions function suffix Intel
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 2.0 pocl 0.13, LLVM 3.8.1
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix POCL
Platform Name Intel Gen OCL Driver
Number of devices 1
Device Name Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
Device Vendor Intel
Device Vendor ID 0x8086
Device Version OpenCL 1.2 beignet 1.3
Driver Version 1.3
Device OpenCL C Version OpenCL C 1.2 beignet 1.3
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 48
Max clock frequency 1000MHz
Device Partition (core)
Max number of sub-devices 1
Supported partition types None, None, None
Max work item dimensions 3
Max work item sizes 512x512x512
Max work group size 512
Preferred work group size multiple 16
Preferred / native vector sizes
char 16 / 8
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 8 (cl_khr_fp16)
float 4 / 4
double 0 / 2 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 4294967296 (4GiB)
Error Correction support No
Max memory allocation 2147483648 (2GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 8192
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4096 bytes
Pitch alignment for 2D image buffers 1 bytes
Max 2D image size 8192x8192 pixels
Max 3D image size 8192x8192x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 65536 (64KiB)
Max constant buffer size 134217728 (128MiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 80ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing cl_khr_fp16
Platform Name Portable Computing Language
Number of devices 1
Device Name pthread-Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Device Vendor GenuineIntel
Device Vendor ID 0x8086
Device Version OpenCL 2.0 pocl
Driver Version 0.13
Device OpenCL C Version OpenCL C 2.0
Device Type CPU, Default
Device Profile FULL_PROFILE
Max compute units 4
Max clock frequency 3400MHz
Device Partition (core)
Max number of sub-devices 4
Supported partition types equally, by counts
Max work item dimensions 3
Max work item sizes 4096x4096x4096
Max work group size 4096
Preferred work group size multiple 8
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (n/a)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 17862586368 (16.64GiB)
Error Correction support No
Max memory allocation 17862586368 (16.64GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 0 bytes
Global 0 bytes
Local 0 bytes
Max size for global variable 0
Preferred total size of global vars 0
Global Memory cache type Read/Write
Global Memory cache size 32768
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 1116411648 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 32768x32768 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args <printDeviceInfo:106: get CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS : error -30>
Max number of pipe args 16
Max active pipe reservations 1
Max pipe packet size 1024
Local memory type Global
Local memory size 17862586368 (16.64GiB)
Max constant buffer size 17862586368 (16.64GiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 16384 (16KiB)
Max size 262144 (256KiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir cl_khr_int64 cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel Gen OCL Driver
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [Intel]
clCreateContext(NULL, ...) [default] Success [Intel]
clCreateContext(NULL, ...) [other] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Intel Gen OCL Driver
Device Name Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Intel Gen OCL Driver
Device Name Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1
平台数量2
平台名称Intel Gen OCL驱动程序
平台供应商英特尔
平台版本OpenCL 1.2 beignet 1.3
平台配置文件完整配置文件
平台扩展cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_local_int32_extended_atomics cl_khr_字节可寻址存储cl_khr_3d图像从英特尔缓冲区cl_khr_深度图像cl_khr_Spiricu cl_可寻址存储cl_3d图像cl_intel_子组短cl_khr_gl_共享
英特尔平台扩展函数后缀
平台名称可移植计算语言
pocl项目的平台供应商
平台版本OpenCL 2.0 pocl 0.13,LLVM 3.8.1
平台配置文件完整配置文件
平台扩展cl_khr_icd
平台扩展函数后缀POCL
平台名称Intel Gen OCL驱动程序
设备数量1
设备名称英特尔(R)虹膜图形6100 BroadWell U处理器GT3
设备供应商英特尔
设备供应商ID 0x8086
设备版本OpenCL 1.2 beignet 1.3
驱动程序版本1.3
设备OpenCL C版本OpenCL C 1.2 beignet 1.3
设备类型GPU
设备配置文件完整配置文件
最大计算单位48
最大时钟频率1000MHz
设备分区(核心)
子设备的最大数量1
支持的分区类型无,无,无
最大工作项维度3
最大工作项大小512x512x512
最大工作组大小512
首选工作组大小倍数16
首选/本机向量大小
字符16/8
短8/8
int 4/4
长2/2
0/8的一半(cl_khr_fp16)
浮动4/4
双0/2(不适用)
半精度浮点支持(cl