Python 如何确保一个函数只有4个实例在运行（使用多处理）？_Python_Python Multiprocessing

Python 如何确保一个函数只有4个实例在运行（使用多处理）？

python

Python 如何确保一个函数只有4个实例在运行（使用多处理）？,python,python-multiprocessing,Python,Python Multiprocessing,我正在使用python 3.6.6和请求和bs4包来下载和解析一些内容，现在我正在下载一些更大的文件>1gb，并且只使用一个连接，速度相当慢，因此我想加快同时进行多个下载的速度重要的代码： def download(dir, link, name): r = requests.get(url, stream=True) with open(f'{path}/{filename}', 'wb') as f: for chunk in r.iter_content(

我正在使用python 3.6.6和

请求

和

bs4

包来下载和解析一些内容，现在我正在下载一些更大的文件

>1gb

，并且只使用一个连接，速度相当慢，因此我想加快同时进行多个下载的速度

重要的代码：

def download(dir, link, name):
    r = requests.get(url, stream=True)
    with open(f'{path}/{filename}', 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

files = [{'link':'http://...','filename':'somename.7z'}]
download_dir= '~/Downloads'

for file in files:
    #do some things to check if file['link'] is valid and that the file dosen't already exist
    download(download_dir, file['link'], file['filename'])

我想做的是在parralel中运行循环中的内容，确切地说，让循环中的内容同时运行4次

我第一次尝试这样做是使用

multiprocessing.Pool.map

如下所示：

def download(dir, link, name):
    r = requests.get(url, stream=True)
    with open(f'{path}/{filename}', 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

files = [{'link':'http://...','filename':'somename.7z'}]
download_dir= '~/Downloads'

datas = [{'file':f, 'dir':download_dir} for f in files]

worker(data)
    file = data['file']
    download_dir = data['dir']
    #do some things to check if file['link'] is valid and that the file dosen't already exist
    download(download_dir, file['link'], file['filename'])

pool = multiprocessing.Pool(4)
pool.map(worker, datas)

不幸的是，它没有工作，同时启动了4次以上的下载，我假设它使用了4个线程，但每次一个线程达到网络限制，而旧的线程都没有进一步的下载，它只是启动了另一个工作线程

为了迫使我的程序做我想做的事情，我尝试了以下这种骇人的方式：

def download(dir, link, name):
    r = requests.get(url, stream=True)
    with open(f'{path}/{filename}', 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

files = [{'link':'http://...','filename':'somename.7z'}]
download_dir= '~/Downloads'

worker(file, download_dir)
    #do some things to check if file['link'] is valid and that the file dosen't already exist
    download(download_dir, file['link'], file['filename'])

index = 0
while index < len(files):
    pool = multiprocessing.Pool(4)

    for _ in range(4):
        if index < len(files): #check exists cause I'm incrementing index in the inner for loop
            pool.apply_async(worker, (files[index], download_dir,))
        index += 1

    pool.close()
    pool.join()

def下载（目录、链接、名称）：
r=requests.get（url，stream=True）
以open（f'{path}/{filename}'，wb'）作为f:
对于r.iter\u内容中的区块（区块大小=1024）：
如果区块：
f、 写入（块）
文件=[{'link'：'http://...'，'filename'：'somename.7z'}]
下载目录=“~/Downloads”
工人（文件，下载目录）
#做一些事情来检查文件['link']是否有效，以及该文件是否不存在
下载（下载目录，文件['link']，文件['filename']）
索引=0
索引


但是pool.close（）
没有等待所有提交的任务完成，而是中止了下载，并且似乎也不允许提交到池中的任务在搁置后恢复
正确的方法是什么？
您的代码有许多逻辑和样式错误。最大的一个是为输入列表中的每个元素创建一个新池！应该只有一个游泳池
if（index
始终为真，因为您在中，而index

pool=multiprocessing.pool（processs=size）
的风格很差；只要有可能，您应该将与
（也称为“上下文管理器”）一起使用。这将消除对pool.close（）
和pool.join（）调用的需要。像这样：
with multiprocessing.Pool(processes=size) as pool:
    pool.apply_async(...)

在Python中，Numeric for循环被认为是拙劣的样式，因为它们很容易避免。如果您只需在url列表中对url使用：
或类似工具，则不需要像index=index+1
这样的样板文件
最后，您根本不需要循环，因为pool.map（）
（和pool.map\u async（）
）存在，并且可以在单个函数调用中完成所有操作。
感谢您花时间给出答案，但这并不能解决我使用pool.map（）时遇到的问题
只会导致python下载太多，而不是等待我的下载
函数完成后再开始另一个。@usbpc102：一旦你解决了我在这里告诉你的错误，请在问题的底部发布更新的代码。一旦我们看到您新更新的代码，我们可能会对其进行进一步的改进，但我认为，一旦您应用了此答案中的建议，您的代码将非常接近您的需要。我更新了我的帖子，更详细地解释了我为什么要这样做。您能解释一下“线程达到网络限制”是什么意思吗？是否引发异常，或者进程只是在等待IO完成时被阻止？有错误消息吗？您是如何认识到这一点的？网络限制是多少（连接数、防火墙、带宽等）？很抱歉，我不清楚这一点，我假设它命中了阻塞IO，因为代码在google云的vm上运行，下载速度至少为500Mbit/s，但当我查看单线程传输的数据量时，大约为10Mbit/s。因此，我假设正在发生的是，对于r.iter\u内容中的chunk（chunk\u size=1024）
必须等待获得1024个字节，然后线程池开始另一个并行执行。