Python多处理:Pool.map()似乎根本不调用函数

Python多处理:Pool.map()似乎根本不调用函数,python,windows,multithreading,Python,Windows,Multithreading,我对多线程相当陌生,所以如果它是基本的,我很抱歉。我有一些功能,OCRs图像文件,我想多线程的任务。该函数不返回任何内容,只保存OCR数据集的文本。代码如下: start_time = time.time() path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test' listfiles = os.listdir(pat

我对多线程相当陌生,所以如果它是基本的,我很抱歉。我有一些功能,OCRs图像文件,我想多线程的任务。该函数不返回任何内容,只保存OCR数据集的文本。代码如下:

start_time = time.time()
path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
listfiles = os.listdir(path)

filterfiles = [p for p in listfiles if p[-4:] == '.tif']

pool = Pool(processes=2)

result = pool.map(OCRimage,filterfiles)

pool.close()
pool.join()

print("--- %s seconds ---" % (time.time() - start_time))
当我运行代码时,它似乎卡在
pool.map()
上。我运行了30分钟,这比试验过程所用的时间要长得多,而且它没有在单一输出上产生效果。我测试了我的函数OCRimage,它似乎没有一次进入函数(使用
print(1)
作为我的OCRimage代码的第一行)。我想知道是否有人能帮我。谢谢

卡梅隆

编辑(添加了OCR图像功能):

OCRimage函数如下所示:

def OCRimage(f):
    #This runs the magick bash script which splits a multi-image tif into multiple single image tiffs
    process = subprocess.Popen(["magick", path + "\\" + f, path + "\\temp\\%d.tif"], shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    print(process.communicate()[0])

    #finds the number of pages for each tiff file (this might not be necassary but the all files in directory python command could access files randomly)
    max1 = -1
    for filename in os.listdir(path+'\\temp'):    
        if (max1 < int(filename[0:-4])):
            max1 = int(filename[0:-4])
    max1 = max1 + 1

    text = ""
    for each in range(0,max1):
        im = Image.open(path + "\\temp\\"+ str(each) + ".tif")
        text = text + pytesseract.image_to_string(im)
    with open(path + "\\result\\OCR-"+f[0:-4]+".txt", 'w') as file:
        file.write(text)    

    for f in os.listdir(path+'\\temp'):
        os.remove(path + '\\temp\\' + f)
编辑3:

只运行OCRimage(f)本身就可以了。我不使用多线程代码,而是使用以下代码:

path = 'C:\\Users\\RNCZF01\\Documents\\Cameron-Fen\\Economics-Projects\\Patent-project\\similarity\\Patents\\OCR-test'
for p in os.listdir(path):
    OCRimage(p)
这似乎表明问题一定出在
OCRimage
功能中(有关真正的问题,请参阅下面的Windows部分):

输出

file_name = image000.tif
file_name = image001.tif
file_name = image002.tif
file_name = image003.tif
file_name = image004.tif
我建议对
OCRimage
的开头进行以下更改:

def OCRimage(file_name):
    print "file_name = %s" % file_name
    src = os.path.join([path, file_name])
    dst = os.path.join([path, 'temp', '%d.tif'])
    command_list = ['magick', src, dst]
    # This runs the magick bash script which splits a multi-image tif into
    # multiple single image tiffs
    process = subprocess.Popen(command_list,
                               shell=True,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    output, errors = process.communicate()
    if process.returncode != 0:
        print "Image processing failed for %s: %s" % (file_name, errors)
        return
    # The rest of your code goes here
验证子流程的返回代码是否为零非常重要。如果它不是零,那么您确实需要查看
错误
字符串

窗口

在Windows上运行时,出现以下异常:

RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main

与其打印到标准输出文件,不如尝试打印到输出文件:)您是否认为打印到标准输出文件会因某种原因而无效?其余代码也不会将OCR文本文件打印到输出文件中。如果没有复制方法,很难帮到您,请尝试创建一个最小的完整可复制示例。证明
filterfiles
不是空的,查看
OCRimage
的代码(即使它只打印)等等。写入文件既不像打印到标准文件,也不像打印到标准文件一样,不会发生任何事情,也不会创建文件。所以问题是,当我不使用多线程时,OCRimage工作得很好,所以至少对我来说,问题是
result=pool.map(OCRimage,filterfiles)
不起作用。即使我制作了
OCRimage(f):返回f**2
。我使用的是python 2.7。您是否运行了我答案顶部的?它能产生预期的输出吗?它不能。我试着以你为例。我认为这是windows的问题,因为我的代码在它设计的linux集群上运行良好。在您的windows机器上,
python--version
的输出是什么?
def OCRimage(file_name):
    print "file_name = %s" % file_name
    src = os.path.join([path, file_name])
    dst = os.path.join([path, 'temp', '%d.tif'])
    command_list = ['magick', src, dst]
    # This runs the magick bash script which splits a multi-image tif into
    # multiple single image tiffs
    process = subprocess.Popen(command_list,
                               shell=True,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    output, errors = process.communicate()
    if process.returncode != 0:
        print "Image processing failed for %s: %s" % (file_name, errors)
        return
    # The rest of your code goes here
RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main
from multiprocessing import Pool

def OCRimage(file_name):
    print "file_name = %s" % file_name

def main():
    filterfiles = ["image%03d.tif" % n for n in range(5)]
    pool = Pool(processes=2)
    result = pool.map(OCRimage, filterfiles)
    pool.close()
    pool.join()

if __name__ == '__main__':
    main()