Python 正确使用多处理进行图像下载_Python_Multiprocessing

Python 正确使用多处理进行图像下载

python

Python 正确使用多处理进行图像下载,python,multiprocessing,Python,Multiprocessing,我编写了以下函数，并在PythonShell中进行了测试，图像下载成功，但是，当我在脚本中运行它时，没有图像下载 import os import requests from time import time import uuid from multiprocessing.pool import ThreadPool main_file_name = 'test1.csv' my_set = set() with open(main_file_name, 'r') as f: #read

我编写了以下函数，并在PythonShell中进行了测试，图像下载成功，但是，当我在脚本中运行它时，没有图像下载

import os
import requests
from time import time
import uuid
from multiprocessing.pool import ThreadPool
main_file_name = 'test1.csv'

my_set = set()
with open(main_file_name, 'r') as f:  #read image urls 
    for row in f:
        my_set.add(row.split(',')[2].strip())

def get_url(entry):
    path = str(uuid.uuid4()) + ".jpg"
    if not os.path.exists(path):
        r = requests.get(entry, stream=True)
        if r.status_code == 200:
            with open(path, 'wb') as f:
                for chunk in r:
                    f.write(chunk)

start = time()
results = ThreadPool(8).imap_unordered(get_url, my_set)
print(f"Elapsed Time: {time() - start}")

我仔细检查了一下，它在shell中运行，脚本中是否缺少任何内容？

“results”属于类

多处理.pool.imapunOrderEditor

，确保URL下载的一个好方法是实际循环

结果

start = time()
results = ThreadPool(8).imap_unordered(fetch_url, my_set)
for _ in results:
    pass
print(f"Elapsed Time: {time() - start}")

另一种方法也可以做到这一点，即确保主线程在退出脚本之前完成，即使用

time.sleep

from time import sleep
start = time()
results = ThreadPool(8).imap_unordered(fetch_url, my_set)
sleep(10)  # make sure this amount is enough to finish downloading
print(f"Elapsed Time: {time() - start}")

脚本不起作用的原因是，在启动

结果后立即结束脚本。python3-i test.py
（或者简单地在shell中复制粘贴代码）起作用的原因是脚本未被终止（主线程有效）因此，图像有时间下载。
尝试在导入后将get\u url（）
定义移到顶部，然后将所有内容放在中，缩进，如果
name\uuuuuuu=='\uuuu main\uuuu'：

guard。有关为什么需要if

\uuu name\uuuuu

\uuuu main\uuuu

'部分的解释，请参阅多处理中的“安全导入主模块”部分。@martineau它没有修复，无论如何，我没有得到错误执行

print

语句，但没有任何图像下载良好的答案。但我建议改变一件事：不要像这样使用

sleep

。等待线程退出的正确方法是调用

join

：

pool=ThreadPool（8）；结果=pool.imap（获取url，我的集合）；pool.close（）；pool.join（）

。这里唯一的缺点是，

multiprocessing.pool.ThreadPool.join

不接受超时。

import os
import requests
from time import time
import uuid
from multiprocessing.pool import ThreadPool
main_file_name = 'test1.csv'

my_set = set()
with open(main_file_name, 'r') as f:  #read image urls 
    for row in f:
        my_set.add(row.split(',')[2].strip())

def get_url(entry):
    path = str(uuid.uuid4()) + ".jpg"
    if not os.path.exists(path):
        r = requests.get(entry, stream=True)
        if r.status_code == 200:
            with open(path, 'wb') as f:
                for chunk in r:
                    f.write(chunk)

start = time()
results = ThreadPool(8).imap_unordered(get_url, my_set)
print(f"Elapsed Time: {time() - start}")