Python 3.x Python3多处理或多线程for循环_Python 3.x_Operating System_Multiprocessing_Python Multithreading

Python 3.x Python3多处理或多线程for循环

python-3.x operating-system

Python 3.x Python3多处理或多线程for循环,python-3.x,operating-system,multiprocessing,python-multithreading,Python 3.x,Operating System,Multiprocessing,Python Multithreading,我有一个基于ML的大软件包，其中有几个模块（4），它们按顺序读取和写入自己的I/O。我还有几个文件（变量号）。我理解thred和过程之间的区别，但我仍然不知道哪一个更适合实施。虚拟结构是这样的 import module1 import module2 import module3 import module4 for fl in list_of_files: tmp_path = os.path.join('tmp', fl) # here we create the folder

我有一个基于ML的大软件包，其中有几个模块（4），它们按顺序读取和写入自己的I/O。我还有几个文件（变量号）。我理解thred和过程之间的区别，但我仍然不知道哪一个更适合实施。虚拟结构是这样的

import module1
import module2
import module3
import module4


for fl in list_of_files:
   tmp_path = os.path.join('tmp', fl) # here we create the folder which holds all tmp files
   module1.do_stuff(fl)
   module2.do_stuff(tmp_path) # input here is output of module1
   module3.do_stuff(tmp_path) # input is output of module2
   module4.do_stuff(tmp_path) # input here is output of module3
aggregate_results('tmp/') # this takes all outputs from module4 and combine them into a single file

现在我的问题是，按照这样的文件分割它有意义吗

import multiprocessing.dummy as mp

def small_proc(fl):
     tmp_path = os.path.join('tmp', fl)
     module1.do_stuff(fl)
     module2.do_stuff(tmp_path)
     module3.do_stuff(tmp_path)
     module4.do_stuff(tmp_path)


p=mp.Pool(len(list_of_files)
p.map(single_file,list_of_files)
p.close()
p.join()

或者根据运行顺序将其拆分，因为我们可以安全地运行一个模块的循环偏移量（如果对任何人都有意义的话？

您尝试过吗？它工作了吗？是的，它工作了，但它已经运行了一段时间，似乎有很多开销。即使是模块（3）中的一个，每个文件也需要大约2小时的时间，因此在最后它可能会在时间上占优势