“多处理”;叉子;使用Python 3.6.1的Linux/Intel Xeon上的上下文块?
问题描述“多处理”;叉子;使用Python 3.6.1的Linux/Intel Xeon上的上下文块?,python,linux,python-3.x,multiprocessing,fork,Python,Linux,Python 3.x,Multiprocessing,Fork,问题描述 我稍微调整了代码(见下文)。但是,当在Linux上运行此脚本时(因此命令行:python script\u name.py),它将为所有作业打印jobs running:x,但之后似乎就卡住了。但是,当我使用spawn方法(mp.set_start_方法('spawn'))时,它工作正常,并立即开始打印计数器变量的值(请参见侦听器方法) 问题 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte
我稍微调整了代码(见下文)。但是,当在Linux上运行此脚本时(因此命令行:
python script\u name.py
),它将为所有作业打印jobs running:x
,但之后似乎就卡住了。但是,当我使用spawn方法(mp.set_start_方法('spawn')
)时,它工作正常,并立即开始打印计数器
变量的值(请参见侦听器
方法)
问题Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 20
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
Stepping: 2
CPU MHz: 2596.501
BogoMIPS: 5193.98
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-19
- 为什么它只在生成进程时工作李>
- 如何调整代码,使其与
配合使用?(因为它可能更快)fork
import io
import csv
import multiprocessing as mp
NEWLINE = '\n'
def file_searcher(file_path):
parsed_file = csv.DictReader(io.open(file_path, 'r', encoding='utf-8'), delimiter='\t')
manager = mp.Manager()
q = manager.Queue()
pool = mp.Pool(mp.cpu_count())
# put listener to work first
watcher = pool.apply_async(listener, (q,))
jobs = []
for row in parsed_file:
print('jobs running: ' + str(len(jobs) + 1))
job = pool.apply_async(worker, (row, q))
jobs.append(job)
# collect results from the workers through the pool result queue
for job in jobs:
job.get()
#now we are done, kill the listener
q.put('kill')
pool.close()
pool.join()
def worker(genome_row, q):
complete_data = []
#data processing
#ftp connection to retrieve data
#etc.
q.put(complete_data)
return complete_data
def listener(q):
'''listens for messages on the q, writes to file. '''
f = io.open('output.txt', 'w', encoding='utf-8')
counter = 0
while 1:
m = q.get()
counter +=1
print(counter)
if m == 'kill':
break
for x in m:
f.write(x + NEWLINE)
f.flush()
f.close()
if __name__ == "__main__":
file_searcher('path_to_some_tab_del_file.txt')
处理器信息Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 20
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
Stepping: 2
CPU MHz: 2596.501
BogoMIPS: 5193.98
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-19
Linux内核版本3.10.0-514.26.2.el7.x86_64
Python版本
Python 3.6.1 :: Continuum Analytics, Inc.
日志
我按照@yacc的建议添加了代码,这将给出以下日志:
[server scripts]$ python main_v3.py
[INFO/SyncManager-1] child process calling self.run()
[INFO/SyncManager-1] created temp directory /tmp/pymp-2a9stjh6
[INFO/SyncManager-1] manager serving at '/tmp/pymp-2a9stjh6/listener-jxwseclw'
[DEBUG/MainProcess] requesting creation of a shared 'Queue' object
[DEBUG/SyncManager-1] 'Queue' callable returned object with id '7f0842da56a0'
[DEBUG/MainProcess] INCREF '7f0842da56a0'
[DEBUG/MainProcess] created semlock with handle 139673691570176
[DEBUG/MainProcess] created semlock with handle 139673691566080
[DEBUG/MainProcess] created semlock with handle 139673691561984
[DEBUG/MainProcess] created semlock with handle 139673691557888
[DEBUG/MainProcess] added worker
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-2] INCREF '7f0842da56a0'
[DEBUG/MainProcess] added worker
[INFO/ForkPoolWorker-2] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-4] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-4] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-3] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-3] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-6] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-5] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-6] child process calling self.run()
[INFO/ForkPoolWorker-5] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-7] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-8] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-7] child process calling self.run()
[INFO/ForkPoolWorker-8] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-9] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-9] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-10] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-10] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-11] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-11] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-12] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-12] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-13] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-13] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-14] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-14] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-15] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-15] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-16] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-16] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-17] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-17] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-18] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-18] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-19] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-19] child process calling self.run()
[DEBUG/MainProcess] added worker
[DEBUG/ForkPoolWorker-20] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-20] child process calling self.run()
jobs running: 1
jobs running: 2
jobs running: 3
jobs running: 4
[DEBUG/ForkPoolWorker-21] INCREF '7f0842da56a0'
[INFO/ForkPoolWorker-21] child process calling self.run()
jobs running: 5
jobs running: 6
jobs running: 7
[DEBUG/ForkPoolWorker-2] INCREF '7f0842da56a0'
jobs running: 8
written to file
jobs running: 9
jobs running: 10
[DEBUG/ForkPoolWorker-2] thread 'MainThread' does not own a connection
[DEBUG/ForkPoolWorker-2] making connection to manager
jobs running: 11
jobs running: 12
jobs running: 13
jobs running: 14
jobs running: 15
[DEBUG/SyncManager-1] starting server thread to service 'ForkPoolWorker-2'
jobs running: 16
jobs running: 17
jobs running: 18
jobs running: 19
[DEBUG/ForkPoolWorker-4] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-3] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-5] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-6] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-7] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-8] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-10] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-9] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-11] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-13] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-14] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-12] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-15] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-16] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-18] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-17] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-20] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-19] INCREF '7f0842da56a0'
[DEBUG/ForkPoolWorker-21] INCREF '7f0842da56a0'
正如@jxh所暗示的,fork和spawn之间的差异非常重要。在第17.2.1.2节中指出,区别在于:forking保留了环境和stdin/out之类的东西,而spawn只是创建了一个新的过程。我认为您的环境中可能存在导致worker函数出现问题的内容,可能是您对其他处理的注释背后的代码。产卵给你一个干净的板岩,事情在这些条件下运行良好 为了确定发生了什么,我会让每个工作人员打印诊断消息,可能会写入每个工作人员特有的文件中。每次要编写消息时打开/关闭该文件,以便更新/刷新内容
Fork不应该比spawn快,因为Fork需要将环境信息复制到新进程。在任何情况下,这只是启动成本,我认为这是最小的,因为工作者需要做一些计算或I/O绑定的工作,您希望并行化 你能提供关于Linux、Python、MP软件包和硬件/处理器版本的详细信息吗?谢谢,至少我的问题现在更具体了,因为我知道你的日志@yaccIt有点尴尬,但至少你有一个关于繁殖的工作。我建议升级到3.6.2,如果仍然出现这种情况,请在Python上打开一张罚单。这对于未来版本的修复可能很重要。这意味着你的“机器”实际上是虚拟的,而不是裸机。这意味着存在一个硬件抽象层,它假装是Linux的机器,而实际上是一个运行在主机操作系统上的进程
VMware
是一家创建虚拟机监控程序的软件公司,虚拟机监控程序是在主机操作系统上运行的软件,用于向虚拟机提供虚拟硬件。由于fork
必须共享来自父系统的资源,Linux很可能会设置锁。在真正的硬件上,这个锁会很便宜,但在虚拟机上,不清楚这个锁的虚拟版本会有多便宜。