Python 正确使用多处理进行图像下载
我编写了以下函数,并在PythonShell中进行了测试,图像下载成功,但是,当我在脚本中运行它时,没有图像下载Python 正确使用多处理进行图像下载,python,multiprocessing,Python,Multiprocessing,我编写了以下函数,并在PythonShell中进行了测试,图像下载成功,但是,当我在脚本中运行它时,没有图像下载 import os import requests from time import time import uuid from multiprocessing.pool import ThreadPool main_file_name = 'test1.csv' my_set = set() with open(main_file_name, 'r') as f: #read
import os
import requests
from time import time
import uuid
from multiprocessing.pool import ThreadPool
main_file_name = 'test1.csv'
my_set = set()
with open(main_file_name, 'r') as f: #read image urls
for row in f:
my_set.add(row.split(',')[2].strip())
def get_url(entry):
path = str(uuid.uuid4()) + ".jpg"
if not os.path.exists(path):
r = requests.get(entry, stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
for chunk in r:
f.write(chunk)
start = time()
results = ThreadPool(8).imap_unordered(get_url, my_set)
print(f"Elapsed Time: {time() - start}")
我仔细检查了一下,它在shell中运行,脚本中是否缺少任何内容?“results”属于类多处理.pool.imapunOrderEditor
,确保URL下载的一个好方法是实际循环结果
start = time()
results = ThreadPool(8).imap_unordered(fetch_url, my_set)
for _ in results:
pass
print(f"Elapsed Time: {time() - start}")
另一种方法也可以做到这一点,即确保主线程在退出脚本之前完成,即使用time.sleep
from time import sleep
start = time()
results = ThreadPool(8).imap_unordered(fetch_url, my_set)
sleep(10) # make sure this amount is enough to finish downloading
print(f"Elapsed Time: {time() - start}")
脚本不起作用的原因是,在启动
结果后立即结束脚本。python3-i test.py
(或者简单地在shell中复制粘贴代码)起作用的原因是脚本未被终止(主线程有效)因此,图像有时间下载。尝试在导入后将get\u url()
定义移到顶部,然后将所有内容放在中,缩进,如果
name\uuuuuuu=='\uuuu main\uuuu':
guard。有关为什么需要if\uuu name\uuuuu
='\uuuu main\uuuu
'部分的解释,请参阅多处理中的“安全导入主模块”部分。@martineau它没有修复,无论如何,我没有得到错误执行print
语句,但没有任何图像下载良好的答案。但我建议改变一件事:不要像这样使用sleep
。等待线程退出的正确方法是调用join
:pool=ThreadPool(8);结果=pool.imap(获取url,我的集合);pool.close();pool.join()
。这里唯一的缺点是,multiprocessing.pool.ThreadPool.join
不接受超时。
import os
import requests
from time import time
import uuid
from multiprocessing.pool import ThreadPool
main_file_name = 'test1.csv'
my_set = set()
with open(main_file_name, 'r') as f: #read image urls
for row in f:
my_set.add(row.split(',')[2].strip())
def get_url(entry):
path = str(uuid.uuid4()) + ".jpg"
if not os.path.exists(path):
r = requests.get(entry, stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
for chunk in r:
f.write(chunk)
start = time()
results = ThreadPool(8).imap_unordered(get_url, my_set)
print(f"Elapsed Time: {time() - start}")