Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Beignet OpenCL PyGPU问题_Python_Opencl_Theano_Beignet - Fatal编程技术网

Python Beignet OpenCL PyGPU问题

Python Beignet OpenCL PyGPU问题,python,opencl,theano,beignet,Python,Opencl,Theano,Beignet,我试图使用OpenCL作为Kubuntu 17.04上Theano的后端,但遇到了一些我无法解决的问题 由于我使用的是Intel Broadwell处理器(i7-5557u,如果有帮助的话),我下载了Beignet源代码(1.3.1)及其所有依赖项的副本,并make&&make install。根据数据,这似乎效果不错,因为 /utest\u run命令报告100%成功 clinfo提供了有关该处理器的OpenCL功能的一整套信息(我认为这些信息是正确的)。在安装Beignet之前,它没有显示任

我试图使用OpenCL作为Kubuntu 17.04上Theano的后端,但遇到了一些我无法解决的问题

由于我使用的是Intel Broadwell处理器(i7-5557u,如果有帮助的话),我下载了Beignet源代码(1.3.1)及其所有依赖项的副本,并
make&&make install
。根据数据,这似乎效果不错,因为

  • /utest\u run
    命令报告100%成功
  • clinfo
    提供了有关该处理器的OpenCL功能的一整套信息(我认为这些信息是正确的)。在安装Beignet之前,它没有显示任何支持
  • 接下来,我下载了一份Anaconda(4.4)的副本,并通过
    conda
    包管理器添加了
    Keras
    (2.0.6)、
    Theano
    (0.9.0)和
    pygpu
    (0.6.9)。Keras和Theano似乎工作得很好,因为我从fast.ai课程改编的python脚本在使用CPU时做了它应该做的事情(显然非常慢)。另外,一个简单的测试脚本取自ye olde internet,它说CPU路径工作正常(供参考)

    为了不使用OpenCL后端,我添加了
    ~/.theanoc
    文件,其中包含以下内容:

    
    [全球]
    floatX=float32
    设备=opencl0:0
    

    现在,当我运行上述pastebin脚本时,会出现以下错误:

    
    错误(theano.gpuarray):无法初始化pygpu,支持已禁用
    回溯(最近一次呼叫最后一次):
    文件“/home/sahab/anaconda2/lib/python2.7/site packages/theano/gpuarray/_init__.py”,第164行,在
    使用(config.device)
    文件“/home/sahab/anaconda2/lib/python2.7/site packages/theano/gpuarray/_init__.py”,第151行,正在使用中
    初始开发(设备)
    文件“/home/sahab/anaconda2/lib/python2.7/site packages/theano/gpuarray/_init__.py”,第60行,在init_dev中
    sched=config.gpuarray.sched)
    pygpu.gpuarray.init中的第634行文件“pygpu/gpuarray.pyx”
    pygpu.gpuarray.pygpuinit中第584行的文件“pygpu/gpuarray.pyx”
    pygpu.gpuarray.GpuContext.中的文件“pygpu/gpuarray.pyx”,第1057行__
    GpuArrayException:clGetPlatformIDs(0、NULL和nump):未知错误
    

    更简单的测试

    
    DEVICE=“opencl0:0”python-c“导入pygpu;pygpu.test()
    

    抛出与上面相同的错误

    我认为问题源于Beignet而不是pygpu,但我不知道如何找到问题的根源,因为clinfo看起来很好。我已经做了很多研究,但这似乎不是人们正在做的事情,因为实际上没有任何文档/博客帖子/你对它的看法。有什么想法吗

    (作为记录,鉴于我现有的计算机,我相当肯定这不会给我带来太多的速度提升,但根据一些阅读资料,它至少会使速度提高1.5到2倍,因此我仍然值得尝试跟踪它)

    我的
    clinfo
    后代输出:

    Number of platforms                               2
      Platform Name                                   Intel Gen OCL Driver
      Platform Vendor                                 Intel
      Platform Version                                OpenCL 1.2 beignet 1.3
      Platform Profile                                FULL_PROFILE
      Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
      Platform Extensions function suffix             Intel
    
      Platform Name                                   Portable Computing Language
      Platform Vendor                                 The pocl project
      Platform Version                                OpenCL 2.0 pocl 0.13, LLVM 3.8.1
      Platform Profile                                FULL_PROFILE
      Platform Extensions                             cl_khr_icd
      Platform Extensions function suffix             POCL
    
      Platform Name                                   Intel Gen OCL Driver
    Number of devices                                 1
      Device Name                                     Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
      Device Vendor                                   Intel
      Device Vendor ID                                0x8086
      Device Version                                  OpenCL 1.2 beignet 1.3
      Driver Version                                  1.3
      Device OpenCL C Version                         OpenCL C 1.2 beignet 1.3
      Device Type                                     GPU
      Device Profile                                  FULL_PROFILE
      Max compute units                               48
      Max clock frequency                             1000MHz
      Device Partition                                (core)
        Max number of sub-devices                     1
        Supported partition types                     None, None, None
      Max work item dimensions                        3
      Max work item sizes                             512x512x512
      Max work group size                             512
      Preferred work group size multiple              16
      Preferred / native vector sizes                 
        char                                                16 / 8       
        short                                                8 / 8       
        int                                                  4 / 4       
        long                                                 2 / 2       
        half                                                 0 / 8        (cl_khr_fp16)
        float                                                4 / 4       
        double                                               0 / 2        (n/a)
      Half-precision Floating-point support           (cl_khr_fp16)
        Denormals                                     No
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 No
        Round to infinity                             No
        IEEE754-2008 fused multiply-add               No
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  No
      Single-precision Floating-point support         (core)
        Denormals                                     No
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 No
        Round to infinity                             No
        IEEE754-2008 fused multiply-add               No
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  No
      Double-precision Floating-point support         (n/a)
      Address bits                                    32, Little-Endian
      Global memory size                              4294967296 (4GiB)
      Error Correction support                        No
      Max memory allocation                           2147483648 (2GiB)
      Unified memory for Host and Device              Yes
      Minimum alignment for any data type             128 bytes
      Alignment of base address                       1024 bits (128 bytes)
      Global Memory cache type                        Read/Write
      Global Memory cache size                        8192
      Global Memory cache line                        64 bytes
      Image support                                   Yes
        Max number of samplers per kernel             16
        Max size for 1D images from buffer            65536 pixels
        Max 1D or 2D image array size                 2048 images
        Base address alignment for 2D image buffers   4096 bytes
        Pitch alignment for 2D image buffers          1 bytes
        Max 2D image size                             8192x8192 pixels
        Max 3D image size                             8192x8192x2048 pixels
        Max number of read image args                 128
        Max number of write image args                8
      Local memory type                               Local
      Local memory size                               65536 (64KiB)
      Max constant buffer size                        134217728 (128MiB)
      Max number of constant args                     8
      Max size of kernel argument                     1024
      Queue properties                                
        Out-of-order execution                        No
        Profiling                                     Yes
      Prefer user sync for interop                    Yes
      Profiling timer resolution                      80ns
      Execution capabilities                          
        Run OpenCL kernels                            Yes
        Run native kernels                            Yes
        SPIR versions                                 1.2
      printf() buffer size                            1048576 (1024KiB)
      Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
      Device Available                                Yes
      Compiler Available                              Yes
      Linker Available                                Yes
      Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing cl_khr_fp16
    
      Platform Name                                   Portable Computing Language
    Number of devices                                 1
      Device Name                                     pthread-Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
      Device Vendor                                   GenuineIntel
      Device Vendor ID                                0x8086
      Device Version                                  OpenCL 2.0 pocl
      Driver Version                                  0.13
      Device OpenCL C Version                         OpenCL C 2.0
      Device Type                                     CPU, Default
      Device Profile                                  FULL_PROFILE
      Max compute units                               4
      Max clock frequency                             3400MHz
      Device Partition                                (core)
        Max number of sub-devices                     4
        Supported partition types                     equally, by counts
      Max work item dimensions                        3
      Max work item sizes                             4096x4096x4096
      Max work group size                             4096
      Preferred work group size multiple              8
      Preferred / native vector sizes                 
        char                                                16 / 16      
        short                                                8 / 8       
        int                                                  4 / 4       
        long                                                 2 / 2       
        half                                                 8 / 8        (n/a)
        float                                                4 / 4       
        double                                               2 / 2        (cl_khr_fp64)
      Half-precision Floating-point support           (n/a)
      Single-precision Floating-point support         (core)
        Denormals                                     No
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 No
        Round to infinity                             No
        IEEE754-2008 fused multiply-add               No
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  No
      Double-precision Floating-point support         (cl_khr_fp64)
        Denormals                                     No
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 No
        Round to infinity                             No
        IEEE754-2008 fused multiply-add               No
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  No
      Address bits                                    64, Little-Endian
      Global memory size                              17862586368 (16.64GiB)
      Error Correction support                        No
      Max memory allocation                           17862586368 (16.64GiB)
      Unified memory for Host and Device              Yes
      Shared Virtual Memory (SVM) capabilities        (core)
        Coarse-grained buffer sharing                 Yes
        Fine-grained buffer sharing                   Yes
        Fine-grained system sharing                   No
        Atomics                                       Yes
      Minimum alignment for any data type             128 bytes
      Alignment of base address                       1024 bits (128 bytes)
      Preferred alignment for atomics                 
        SVM                                           0 bytes
        Global                                        0 bytes
        Local                                         0 bytes
      Max size for global variable                    0
      Preferred total size of global vars             0
      Global Memory cache type                        Read/Write
      Global Memory cache size                        32768
      Global Memory cache line                        64 bytes
      Image support                                   Yes
        Max number of samplers per kernel             16
        Max size for 1D images from buffer            1116411648 pixels
        Max 1D or 2D image array size                 2048 images
        Max 2D image size                             32768x32768 pixels
        Max 3D image size                             2048x2048x2048 pixels
        Max number of read image args                 128
        Max number of write image args                128
        Max number of read/write image args           <printDeviceInfo:106: get CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS : error -30>
      Max number of pipe args                         16
      Max active pipe reservations                    1
      Max pipe packet size                            1024
      Local memory type                               Global
      Local memory size                               17862586368 (16.64GiB)
      Max constant buffer size                        17862586368 (16.64GiB)
      Max number of constant args                     8
      Max size of kernel argument                     1024
      Queue properties (on host)                      
        Out-of-order execution                        No
        Profiling                                     Yes
      Queue properties (on device)                    
        Out-of-order execution                        Yes
        Profiling                                     Yes
        Preferred size                                16384 (16KiB)
        Max size                                      262144 (256KiB)
      Max queues on device                            1
      Max events on device                            1024
      Prefer user sync for interop                    Yes
      Profiling timer resolution                      1ns
      Execution capabilities                          
        Run OpenCL kernels                            Yes
        Run native kernels                            Yes
        SPIR versions                                 1.2
      printf() buffer size                            1048576 (1024KiB)
      Built-in kernels                                
      Device Available                                Yes
      Compiler Available                              Yes
      Linker Available                                Yes
      Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir cl_khr_int64 cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
    
    NULL platform behavior
      clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel Gen OCL Driver
      clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [Intel]
      clCreateContext(NULL, ...) [default]            Success [Intel]
      clCreateContext(NULL, ...) [other]              Success [POCL]
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
        Platform Name                                 Intel Gen OCL Driver
        Device Name                                   Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
        Platform Name                                 Intel Gen OCL Driver
        Device Name                                   Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
    
    ICD loader properties
      ICD loader Name                                 OpenCL ICD Loader
      ICD loader Vendor                               OCL Icd free software
      ICD loader Version                              2.2.11
      ICD loader Profile                              OpenCL 2.1
    
    平台数量2
    平台名称Intel Gen OCL驱动程序
    平台供应商英特尔
    平台版本OpenCL 1.2 beignet 1.3
    平台配置文件完整配置文件
    平台扩展cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_local_int32_extended_atomics cl_khr_字节可寻址存储cl_khr_3d图像从英特尔缓冲区cl_khr_深度图像cl_khr_Spiricu cl_可寻址存储cl_3d图像cl_intel_子组短cl_khr_gl_共享
    英特尔平台扩展函数后缀
    平台名称可移植计算语言
    pocl项目的平台供应商
    平台版本OpenCL 2.0 pocl 0.13,LLVM 3.8.1
    平台配置文件完整配置文件
    平台扩展cl_khr_icd
    平台扩展函数后缀POCL
    平台名称Intel Gen OCL驱动程序
    设备数量1
    设备名称英特尔(R)虹膜图形6100 BroadWell U处理器GT3
    设备供应商英特尔
    设备供应商ID 0x8086
    设备版本OpenCL 1.2 beignet 1.3
    驱动程序版本1.3
    设备OpenCL C版本OpenCL C 1.2 beignet 1.3
    设备类型GPU
    设备配置文件完整配置文件
    最大计算单位48
    最大时钟频率1000MHz
    设备分区(核心)
    子设备的最大数量1
    支持的分区类型无,无,无
    最大工作项维度3
    最大工作项大小512x512x512
    最大工作组大小512
    首选工作组大小倍数16
    首选/本机向量大小
    字符16/8
    短8/8
    int 4/4
    长2/2
    0/8的一半(cl_khr_fp16)
    浮动4/4
    双0/2(不适用)
    半精度浮点支持(cl