C++ OpenCV 3.4 C++；Cuda加速需要比CPU更多的时间_C++_Opencv_Gpu_Hardware Acceleration

C++ OpenCV 3.4 C++；Cuda加速需要比CPU更多的时间

c++ opencv

C++ OpenCV 3.4 C++；Cuda加速需要比CPU更多的时间,c++,opencv,gpu,hardware-acceleration,C++,Opencv,Gpu,Hardware Acceleration,我正在用CUDA测试OpenCV GPU加速，但GPU比CPU慢。这仅仅是关于中值滤波，还是我在代码中做错了什么？为什么GPU上的纯处理时间高于CPU 输出： Device 0: "GeForce GT 330M" 1023Mb, sm_12 (not Fermi), 48 cores, Driver/Runtime ver.6.50/6.50 Size of the Image: 512 x 512 GPU Time Includes up&download Times: 853

我正在用CUDA测试OpenCV GPU加速，但GPU比CPU慢。这仅仅是关于中值滤波，还是我在代码中做错了什么？为什么GPU上的纯处理时间高于CPU

输出：

Device 0:  "GeForce GT 330M"  1023Mb, sm_12 (not Fermi), 
48 cores, Driver/Runtime ver.6.50/6.50
Size of the Image: 512 x 512
GPU Time Includes up&download Times: 8531/100 = 85ms
GPU Time Includes only 'apply': 8307/100 = 83ms
CPU Time: 1855/100 = 18ms

代码：

void CPUvsGPU（）
{
qedtimer定时器；
中国证监会；
射线；
cuda：：GpuMat gGray；
cuda:：printShortCudaDeviceInfo（cuda:：getDevice（））；
中国证监会=imread（“baboon.jpg”）；
试着看看这个，你的GPU列在遗留GPU中
在进行比较时，还应尝试查找和查找其他需要考虑的问题。您获得的加速度并非所有功能都相同。有些功能得到了很小的增强，有些功能速度非常惊人。您运行的CUDA GPU实际上是有史以来最小、速度最慢的CUDA GPU之一。也许更好的问题是，为什么您会uld希望GPU更快？@Talonmes现在一切都清楚了：）快速CUDA GPU的主要标准是什么？CUDA内核的数量、帧缓冲区、时钟速度或其他任何东西？是的，CUDA内核是一个因素，但它与时钟速度和内存速度有关，所以是的，48个CUDA内核没有那么多（我有1050钛和768个cuda芯，这不是市场上最好的）中值滤波在gpu上也不应该太好。你的第一句话不正确。gpu被列出并支持CUDA，尽管只是通过CUDA 6.5。但是，由于OP的帖子明确指出使用CUDA 6.5，我没有看到任何证据支持你的说法。@RobertCrovella感谢你的评论。我已经删除了错误的说法。我第二个链接基本上处理GPU的初始化时间问题，但是OpenCV GPU代码有时会慢一些，尽管初始化时间很长
void CPUvsGPU()
{
    QElapsedTimer timer;
    Mat cSrc;
    Mat cGray;
    cuda::GpuMat gGray;
    cuda::printShortCudaDeviceInfo(cuda::getDevice());
    cSrc = imread("baboon.jpg");
    cout << "Size of the Image: " << cSrc.size << endl;

    cvtColor(cSrc, cGray, COLOR_BGR2GRAY);

    gGray.upload(cGray);

    Mat cOut(cGray.size(), CV_8U);
    cuda::GpuMat gOut(gGray.size(), CV_8U);

    Ptr <cuda::Filter> mf;
    mf = cuda::createMedianFilter(CV_8UC1,9);

    mf->apply(gGray, gOut);//don't measure first operation's time on GPU

    timer.start();
    for (int i = 0; i<100 ; i++)
    {
        gGray.upload(cGray);
        mf->apply(gGray, gOut);
        gOut.download(cOut);
    }
    cout << "GPU Time Includes up&download Times: " << timer.elapsed() << "/100 = " << timer.elapsed()/100 <<"ms" << endl;

    timer.start();
    for (int i = 0; i<100 ; i++)
        mf->apply(gGray, gOut);
    cout << "GPU Time Includes only 'apply': " << timer.elapsed() << "/100 = " << timer.elapsed()/100 <<"ms" << endl;

    timer.start();
    for (int i = 0; i<100 ; i++)
        medianBlur(cGray,cOut,9);
    cout << "CPU Time: " << timer.elapsed() << "/100 = " << timer.elapsed()/100 <<"ms" << endl;
}