Python 正确使用带生成器的线程池

Python 正确使用带生成器的线程池,python,multiprocessing,threadpool,python-multiprocessing,yield-keyword,Python,Multiprocessing,Threadpool,Python Multiprocessing,Yield Keyword,在Python2.7中处理CSV文件时,我无法将ThreadPools与生成器一起使用。下面是一些示例代码,说明了我的观点: from multiprocessing.dummy import Pool as ThreadPool import time def getNextBatch(): # Reads lines from a huge CSV and yields them as required. for i in range(5): yield i

在Python2.7中处理CSV文件时,我无法将
ThreadPool
s与
生成器一起使用。下面是一些示例代码,说明了我的观点:

from multiprocessing.dummy import Pool as ThreadPool
import time

def getNextBatch():
    # Reads lines from a huge CSV and yields them as required.
    for i in range(5):
        yield i;

def processBatch(batch):
    # This simulates a slow network request that happens.
    time.sleep(1);
    print "Processed Batch " + str(batch);

# We use 4 threads to attempt to aleviate the bottleneck caused by network I/O.
threadPool = ThreadPool(processes = 4)

batchGenerator = getNextBatch()

for batch in batchGenerator:
    threadPool.map(processBatch, (batch,))

threadPool.close()
threadPool.join()
当我运行此命令时,我得到了预期的输出:

已处理批0

加工批次1

加工批次2

加工批次3

加工批次4

问题是它们在每次打印之间出现1秒延迟。实际上,我的脚本是按顺序运行的(而不是像我希望的那样使用多个线程)

这里的目标是让打印出来的语句在1秒钟后全部显示,而不是在5秒钟内每秒显示一条语句。

这是您的问题

for batch in batchGenerator:
    threadPool.map(processBatch, (batch,))
当我试着

threadPool.map(processBatch,batchGenerator)

它按预期工作(但不按顺序)。for循环使用线程池一次一个地处理每个批。所以它完成了一个,然后继续,然后