Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/cplusplus/141.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C++ 我的GPU加速opencv代码比普通opencv慢_C++_Opencv_Gpu - Fatal编程技术网

C++ 我的GPU加速opencv代码比普通opencv慢

C++ 我的GPU加速opencv代码比普通opencv慢,c++,opencv,gpu,C++,Opencv,Gpu,我复制了《使用OpenCV和CUDA使用GPU加速计算机视觉》一书中的两个例子,以比较CPU和GPU的性能 第1代码: cv::Mat src=cv::imread(“D:/Pics/Pen.jpg”,0);//Pen.jpg是一张4096*4096的灰度图片。 cv::Mat result_host1、result_host2、result_host3、result_host4、result_host5; //以毫秒为单位获取初始时间 int64 work_begin=getTickCount

我复制了《使用OpenCV和CUDA使用GPU加速计算机视觉》一书中的两个例子,以比较CPU和GPU的性能

第1代码:

cv::Mat src=cv::imread(“D:/Pics/Pen.jpg”,0);//Pen.jpg是一张4096*4096的灰度图片。
cv::Mat result_host1、result_host2、result_host3、result_host4、result_host5;
//以毫秒为单位获取初始时间
int64 work_begin=getTickCount();
cv::threshold(src,result_host1,128.0,255.0,cv::THRESH_二进制);
cv::threshold(src,result_host2,128.0,255.0,cv::THRESH_BINARY_INV);
cv::threshold(src,result_host3,128.0,255.0,cv::THRESH_TRUNC);
cv::threshold(src,result_host4,128.0,255.0,cv::THRESH_TOZERO);
cv::threshold(src,result_host5,128.0,255.0,cv::THRESH_to zero_INV);
//工作结束后争取时间
int64 delta=getTickCount()-开始工作;
//定时器频率
double freq=getTickFrequency();
双工作频率=频率/增量;

std::cout我能想到两个原因,为什么即使没有内存操作,CPU版本也会更快:

1.在第2和第3个代码版本中,您声明了结果GpuMat,但没有实际初始化它们,通过调用GpuMat.create,结果GpuMat的初始化将在阈值方法内发生,这将导致每次执行80MB的GPU内存分配,您可以看到“性能改进”通过初始化结果gpumat一次,然后重用它们。 使用原始的第3个代码,我得到以下结果(Geforce RTX 2080):

时间:0.010208 FPS:97.9624

当我将代码更改为:

...
d_resut1.create(h_img1.size(), CV_8UC1);
d_result2.create(h_img1.size(), CV_8UC1);
d_result3.create(h_img1.size(), CV_8UC1);
d_result4.create(h_img1.size(), CV_8UC1);
d_result5.create(h_img1.size(), CV_8UC1);
d_img1.upload(h_img1);
//Measure initial time ticks
int64 work_begin = getTickCount();
cv::cuda::threshold(d_img1, d_result1, 128.0, 255.0, cv::THRESH_BINARY);
cv::cuda::threshold(d_img1, d_result2, 128.0, 255.0, cv::THRESH_BINARY_INV);
cv::cuda::threshold(d_img1, d_result3, 128.0, 255.0, cv::THRESH_TRUNC);
cv::cuda::threshold(d_img1, d_result4, 128.0, 255.0, cv::THRESH_TOZERO);
cv::cuda::threshold(d_img1, d_result5, 128.0, 255.0, cv::THRESH_TOZERO_INV);
...
我得到以下结果(2倍更好) 时间:0.00503374 FPS:198.659

虽然GpuMat结果预分配带来了显著的性能提升,但对CPU版本的相同修改并没有带来

2.K2100M不是一个非常强大的GPU(665 MHz时有576个内核),考虑到OpenCV可能(取决于编译方式)在CPU(2.90GHz,8个虚拟内核)版本的引擎盖下使用多线程SIMD指令,结果并不令人惊讶

编辑: 通过使用NVIDIA Nsight系统评测应用程序,您可以更好地了解GPU内存操作的惩罚:

如您所见,仅分配和释放内存需要10.5毫秒,而阈值设置本身只需要5毫秒

    Performance of Thresholding on GPU:
    Time: 0.599032
    FPS: 1.66936
Performance of Thresholding on GPU: 
Time: 0.136095
FPS: 7.34779
         1st         2nd         3rd
         CPU         GPU         GPU
Time: 0.0475497   0.599032    0.136095
FPS:  21.0306     1.66936     7.34779
*********************************************************
NVIDIA Quadro K2100M

Micro architecture: Kepler

Compute capability version: 3.0

CUDA Version: 10.1
*********************************************************
*********************************************************
laptop hp ZBook

CPU: Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz 2.90 GHZ

RAM: 8.00 GB

OS: Windows 7, 64-bit, Ultimate, Service Pack 1
*********************************************************
...
d_resut1.create(h_img1.size(), CV_8UC1);
d_result2.create(h_img1.size(), CV_8UC1);
d_result3.create(h_img1.size(), CV_8UC1);
d_result4.create(h_img1.size(), CV_8UC1);
d_result5.create(h_img1.size(), CV_8UC1);
d_img1.upload(h_img1);
//Measure initial time ticks
int64 work_begin = getTickCount();
cv::cuda::threshold(d_img1, d_result1, 128.0, 255.0, cv::THRESH_BINARY);
cv::cuda::threshold(d_img1, d_result2, 128.0, 255.0, cv::THRESH_BINARY_INV);
cv::cuda::threshold(d_img1, d_result3, 128.0, 255.0, cv::THRESH_TRUNC);
cv::cuda::threshold(d_img1, d_result4, 128.0, 255.0, cv::THRESH_TOZERO);
cv::cuda::threshold(d_img1, d_result5, 128.0, 255.0, cv::THRESH_TOZERO_INV);
...