在OpenCV Python中更快地调整图像大小_Python_Opencv_Image Processing

在OpenCV Python中更快地调整图像大小

python opencv image-processing

在OpenCV Python中更快地调整图像大小,python,opencv,image-processing,Python,Opencv,Image Processing,我在一个文件夹（5M+）中有很多图像文件。这些图像大小不同。我想将这些图像调整为128x128 我在一个循环中使用以下函数，使用OpenCV在Python中调整大小 def read_image(img_path): # print(img_path) img = cv2.imread(img_path) img = cv2.resize(img, (128, 128)) return img for file in tqdm(glob.glob('train-

我在一个文件夹（5M+）中有很多图像文件。这些图像大小不同。我想将这些图像调整为

128x128

我在一个循环中使用以下函数，使用OpenCV在Python中调整大小

def read_image(img_path):
    # print(img_path)
    img = cv2.imread(img_path)
    img = cv2.resize(img, (128, 128))
    return img

for file in tqdm(glob.glob('train-images//*.jpg')):
    img = read_image(file)
    img = cv2.imwrite(file, img)

但这需要7个多小时才能完成。我想知道是否有任何方法可以加快这一进程

我是否可以通过

dask

或其他方式实现并行处理以有效地完成此任务。？如果是这样，怎么可能呢？

如果这些图像存储在磁性硬盘上，您很可能会发现您受到读/写速度的限制（在旋转的磁盘上，许多小的读写速度非常慢）

否则，您总是可以将问题抛出处理池，以利用多个内核：

from multiprocessing.dummy import Pool
from multiprocessing.sharedctypes import Value
from ctypes import c_int
import time, cv2, os

wdir = r'C:\folder full of large images'
os.chdir(wdir)

def read_imagecv2(img_path, counter):
    # print(img_path)
    img = cv2.imread(img_path)
    img = cv2.resize(img, (128, 128))
    cv2.imwrite('resized_'+img_path, img) #write the image in the child process (I didn't want to overwrite my images)
    with counter.get_lock(): #processing pools give no way to check up on progress, so we make our own
        counter.value += 1

if __name__ == '__main__':
    # start 4 worker processes
    with Pool(processes=4) as pool: #this should be the same as your processor cores (or less)
        counter = Value(c_int, 0) #using sharedctypes with mp.dummy isn't needed anymore, but we already wrote the code once...
        chunksize = 4 #making this larger might improve speed (less important the longer a single function call takes)
        result = pool.starmap_async(read_imagecv2, #function to send to the worker pool
                                    ((file, counter) for file in os.listdir(os.getcwd()) if file.endswith('.jpg')),  #generator to fill in function args
                                    chunksize) #how many jobs to submit to each worker at once
        while not result.ready(): #print out progress to indicate program is still working.
            #with counter.get_lock(): #you could lock here but you're not modifying the value, so nothing bad will happen if a write occurs simultaneously
            #just don't `time.sleep()` while you're holding the lock
            print("\rcompleted {} images   ".format(counter.value), end='')
            time.sleep(.5)
        print('\nCompleted all images')

由于

cv2

的问题不能很好地处理多处理，我们可以通过将

multiprocessing.Pool

替换为

multiprocessing.dummy.Pool

来使用线程而不是进程。无论如何，许多openCV函数都会释放GIL，因此我们仍然应该看到同时使用多个核的计算优势。此外，这还减少了一些开销，因为线程没有进程那么重。经过一些调查，我还没有发现一个图像库可以很好地处理进程。当试图pickle一个函数以发送到子进程（如何将工作项发送到子进程进行计算）时，它们似乎都失败了。

如果您完全想在Python中这样做，那么请忽略我的回答。如果你对简单快速地完成工作感兴趣，请继续阅读

如果有很多事情需要并行完成，甚至更多，那么我建议gnupallel，因为cpu会变得“更胖”，有更多的内核，而不是更高的时钟频率（GHz）

最简单的方法是，在Linux、macOS和Windows中，您可以像这样从命令行使用ImageMagick来调整一组图像的大小：

magick mogrify -resize 128x128\! *.jpg

如果您有数百个图像，那么最好并行运行，即：

parallel magick mogrify -resize 128x128\! ::: *.jpg

如果您有数百万个图像，

*.jpg

的扩展将溢出shell的命令缓冲区，因此您可以使用以下方法将图像名称输入到

stdin

中，而不是将其作为参数传递：

find -iname \*.jpg -print0 | parallel -0 -X --eta magick mogrify -resize 128x128\!

这里有两个“窍门”：

我使用
```
查找-print0
```
与
```
parallel-0
```
一起以null终止文件名，这样就不会出现空格问题
我使用了
```
parallel-X
```
，这意味着，GNU parallel计算出可以接受多少文件名
```
mogrify
```
，而不是为每个图像开始一个全新的
```
mogrify
```
过程，并分批给出那么多文件名

我向你们推荐这两种工具

虽然上述答案的ImageMagick方面在Windows上起作用，但我不使用Windows，我不确定是否在那里使用GNU Parallel。我想它可能在

gitbash

和/或

Cygwin

下运行-你可以试着问一个单独的问题-它们是免费的

关于ImageMagick部分，我认为您可以使用以下命令获得文件中所有JPEG文件名的列表：

DIR /S /B *.JPG > filenames.txt

然后，您可能可以这样处理它们（不是并行处理）：

magick mogrify -resize 128x128\! @filenames.txt

parallel --eta -a filenames.txt magick mogrify -resize 128x128\!

如果您了解如何在Windows上运行GNU Parallel，您可能可以使用如下方式并行处理它们：

magick mogrify -resize 128x128\! @filenames.txt

parallel --eta -a filenames.txt magick mogrify -resize 128x128\!

只是为了确定，您是否打算覆盖原件？@Aaron，是的，我正在尝试重写文件。我会尝试让您知道结果代码运行时没有任何错误，它显示了tqdm进度条，但它在1秒内达到100%，我怀疑文件没有resized@SreeramTP这似乎是一个已知的问题：鉴于此，我将为线程而不是进程重新编写代码（无论如何都比较容易，只是因为GIL，它通常不允许有太多的性能增益）results@SreeramTP我可能还建议用类似于

skimage

或

PIL

的东西来替换

opencv

。它们都有各自的优点，你可能会发现其中一种可能在这里比另一种更好。我补充了我对Windows所知甚少的内容。GNU Parallel每年至少在git bash和CygWin上测试一次。基本功能正常工作。如果高级功能不可用，请提交错误报告。