在python中将字典传递给具有可修改元素的进程_Python_Multithreading_Multiprocessing_Pyroot

在python中将字典传递给具有可修改元素的进程

python multithreading

在python中将字典传递给具有可修改元素的进程,python,multithreading,multiprocessing,pyroot,Python,Multithreading,Multiprocessing,Pyroot,我正在尝试使用多处理库的进程模块来执行我的代码，以获得更好的性能代码的框架是为他们处理的每个线程创建字典，完成后，这些字典被汇总并保存到一个文件中。创建的资源如下所示： histos = {} for int i in range(number_of_threads): histos[i] = {} histos[i]['all'] = ROOT.TH1F objects histos[i]['kinds_of'] = ROOT.TH1F objects

我正在尝试使用多处理库的进程模块来执行我的代码，以获得更好的性能

代码的框架是为他们处理的每个线程创建字典，完成后，这些字典被汇总并保存到一个文件中。创建的资源如下所示：

histos = {}
for int i in range(number_of_threads):
    histos[i] = {}
    histos[i]['all'] =      ROOT.TH1F objects
    histos[i]['kinds_of'] = ROOT.TH1F objects
    histos[i]['keys'] =     ROOT.TH1F objects

然后在这些进程中，每个线程使用自己的histos[thread_number]对象，处理包含的ROOT.TH1Fs。但是，我的问题是，如果我以如下方式启动线程：

proc = {}
for i in range(Nthreads):
    it0 = 0 + i * n_entries / Nthreads  # just dividing up the workload
    it1 = 0 + (i+1) * n_entries / Nthreads 
    proc[i] = Process(target=RecoAndRecoFix, args=(i, it0, it1, ch,histos)) 
    # args: i is the thread id (index), it0 and it1 are indices for the workload,
    # ch is a variable that is read-only, and histos is what we defined before, 
    # and the contained TH1Fs are what the threads put their output into.
    # The RecoAndFix function works inside with histos[i], thus only accessing
    # the ROOT.TH1F objects that are unique to it. Each thread works with its own histos[i] object.
    proc[i].start()

typecoder = {}
histos = Array(typecoder,number_of_threads)
for int i in range(number_of_threads):
    histos[i] = {}
    histos[i]['all'] =      ROOT.TH1F objects
    histos[i]['kinds_of'] = ROOT.TH1F objects
    histos[i]['keys'] =     ROOT.TH1F objects

然后线程确实可以访问histos[i]对象，但不能写入它们。确切地说，当我对TH1F直方图调用Fill（）时，没有数据被填充，因为它无法写入对象，因为它们不是共享变量

因此，这里：我发现我应该使用multiprocessing.Array（）来创建一个线程可以读写的数组，如下所示：

proc = {}
for i in range(Nthreads):
    it0 = 0 + i * n_entries / Nthreads  # just dividing up the workload
    it1 = 0 + (i+1) * n_entries / Nthreads 
    proc[i] = Process(target=RecoAndRecoFix, args=(i, it0, it1, ch,histos)) 
    # args: i is the thread id (index), it0 and it1 are indices for the workload,
    # ch is a variable that is read-only, and histos is what we defined before, 
    # and the contained TH1Fs are what the threads put their output into.
    # The RecoAndFix function works inside with histos[i], thus only accessing
    # the ROOT.TH1F objects that are unique to it. Each thread works with its own histos[i] object.
    proc[i].start()

typecoder = {}
histos = Array(typecoder,number_of_threads)
for int i in range(number_of_threads):
    histos[i] = {}
    histos[i]['all'] =      ROOT.TH1F objects
    histos[i]['kinds_of'] = ROOT.TH1F objects
    histos[i]['keys'] =     ROOT.TH1F objects

但是，它不接受dictionary作为类型。它不会工作，上面写着TypeError:unhabable类型：“dict”

那么，解决这个问题的最佳方法是什么？我需要的是将字典中存储的每个“各种键”的一个实例传递给每个线程，以便它们自己工作。他们必须能够写入这些接收到的资源

谢谢你的帮助，如果我忽略了一些琐碎的事情，我很抱歉，我以前做过线程化代码，但现在还没有用python。

缺少的是“进程”和“线程”之间的区别；您可以在帖子中混合使用它们，但您的方法只适用于线程，而不适用于进程

所有线程共享内存；他们都会引用同一本词典，因此可以使用它与彼此和家长进行交流

进程有单独的内存；每个人都会得到自己的字典。如果他们想要沟通，他们必须通过其他方式沟通（例如，使用）。另一方面，这意味着他们获得了分离的安全性

Python中的另一个复杂问题是“GIL”；线程大多会串行共享同一个Python解释器，只有在执行I/O、访问网络或使用一些专门为其提供服务的库（numpy、图像处理等）时才能并行运行。同时，进程获得了完全的并行性。

Python多处理模块有一个管理器类，它提供了可以跨线程和进程共享的字典

有关示例，请参阅文档：

它比这更复杂。字典中有一些对象，这些对象的方法可能会被调用，从而更新对象。这些更新需要反映回主流程。它们也需要是“托管”对象。谢谢，将其从进程切换到线程确实允许线程使用args中的字典。然而，这似乎要慢得多——当进程弹出并立即启动时，线程的启动速度似乎非常慢。此外，奇怪的索引错误在目标函数中随机出现。我不明白为什么。当进程可靠地运行时，只是没有输出。对于线程，是否存在索引错误？我所做的只是将proc[I]=进程（…）更改为proc[I]=线程。线程（…）我是否也应该更改其他内容，以适应线程的更改？是的，（a）线程将主要以串行方式共享同一个Python解释器，（b）进程可以安全地分离。如果您的作业主要使用Python代码（很少使用I/O、网络或numpy操作），请切换回进程，并通过多处理队列或类似的方式将结果传递回。感谢所有帮助！最终的解决方案确实是恢复到进程es，但是他们不再尝试将部分结果传递回父进程，而是将其保存到自己的根输出文件中。之后，另一个脚本可以收集这些保存的文件并将其堆叠。现在一切正常，再次感谢您的解释：）