Python 3.x 文件夹中文件的多处理队列_Python 3.x_Multiprocessing

Python 3.x 文件夹中文件的多处理队列

python-3.x

Python 3.x 文件夹中文件的多处理队列,python-3.x,multiprocessing,Python 3.x,Multiprocessing,问题是，当我递归地抓取目录时，如何在Python3.7多处理中正确地处理文件我的代码如下： def f(directoryout,directoryoutfailed,datafile,filelist_failed,imagefile,rootpath,extension,debug): […]一些处理 if __name__ == '__main__': import csv import os debug = 0 timeout = 20 if debug == 0: fol

问题是，当我递归地抓取目录时，如何在Python3.7多处理中正确地处理文件

我的代码如下：

def f(directoryout,directoryoutfailed,datafile,filelist_failed,imagefile,rootpath,extension,debug):

[…]一些处理

if __name__ == '__main__':
import csv
import os
debug = 0
timeout = 20

if debug == 0:
    folder              = '/home/debian/Desktop/environments/dedpul/files/fp/'
    datafile            = 'fpdata.csv' # file with results
    directoryout        = 'fp_out' # out directory for debugging
    directoryoutfailed  = 'fp_out_failed' # out directory for wrongly processed for debuggin mode
    filelist            = 'filelist.csv' # list of processed files
    filelist_failed     = 'filelist_failed.csv' # list of wrongly processed files

counter = 0

pool = Pool(processes=4)
for root, subFolders, files in os.walk(folder):
    for imagefile in files:
        rootpath = root+'/'
        fullpath = root+'/'+imagefile
        extension = os.path.splitext(imagefile)[1]
        imagefilesplit = imagefile.split('.')[0]
        counter += 1

        print('\033[93m ## ',counter,' ## \033[0m',rootpath)

        fileexist = 0
        with open(filelist) as csv_file:
            csv_reader = csv.reader(csv_file, delimiter=',')
            for row in csv_reader:
                if row[0] == fullpath:
                    fileexist = 1
        if fileexist == 1:
            print('    File was processed, skipping...')
            continue

        with open(filelist, mode='a') as csv_file:
            writer = csv.writer(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
            writer.writerow([fullpath])

        # print(directoryout,directoryoutfailed,datafile,filelist_failed,imagefile,rootpath,extension,debug)
        res = pool.apply(f, (directoryout,directoryoutfailed,datafile,filelist_failed,imagefile,rootpath,extension,debug)) 
pool.close()
pool.join()

首先，当我使用pool.apply\u async时，它使用所有内核，但是它不能正确处理函数f（）。使用pool.apply（）它可以单线程工作
第二，正如你们所看到的，我在循环中递归地抓取文件夹中的文件列表。如果发现文件已处理，则此循环应继续。我应该在main\uuuu函数中执行此操作，还是应该将其移动到f（）函数中？如果是，如何在处理过程中交换内容（每个文件需要几秒钟）

第三，函数f（）是独立的，因此如果它将处理图像文件，然后将结果添加到fpdata.csv文件中（或将未处理好的文件名添加到文件列表\u failed.csv），只需关闭处理而不出现任何问题，因此不需要实际输出。我只需要在多处理中启动这个函数

我做错了什么？我应该用吗

with Pool(processes=4) as pool:

声明？

在询问这个问题之前，我浏览了大量的答案，但显然在Python手册中也很难找到这样的文件处理。

pool.apply_async的具体问题是什么？通常处理一个图像需要几秒钟，pool.apply也可以正常工作（我的意思是，它可以保存数据，等等）。对于pool.apply_async，它只运行函数f（），显示有关文件的信息，但不进行任何处理，也不显示函数f（）提供的信息。它可以非常快速地读取所有文件如果您将来自

apply\u async

的所有返回值保存在一个列表中，然后通过迭代列表并使用

ret.get（）

等待所有返回值完成，会怎么样？那么它工作正常吗？我使用较小的文件样本进行了测试。因此，它首先在所有对象上循环，然后使用函数f（）启动多处理。只要您调用

apply\u async

，它就会开始在池进程中运行

。它不会等待您调用

res.get（）

开始运行

。对列表中的每一项调用

res.get（）

，只会使代码等待所有后台工作完成，然后再尝试

关闭/加入池。