Python多进程-使用多个进程时出错

Python多进程-使用多个进程时出错,python,multiprocessing,Python,Multiprocessing,我尝试(未成功)使用多处理并行化循环。 以下是我的Python代码: from MMTK import * from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator from MMTK.Proteins import Protein, PeptideChain import numpy as np filename = 'traj_prot_nojump.nc' trajectory = Traje

我尝试(未成功)使用多处理并行化循环。 以下是我的Python代码:

from MMTK import *
from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator
from MMTK.Proteins import Protein, PeptideChain
import numpy as np

filename = 'traj_prot_nojump.nc'

trajectory = Trajectory(None, filename)
universe = trajectory.universe
proteins = universe.objectList(Protein)
chain = proteins[0][0]

def calpha_2dmap_mult(t = range(0,len(trajectory))):
    dist = []
    global trajectory
    universe = trajectory.universe
    proteins = universe.objectList(Protein)
    chain = proteins[0][0]
    traj = trajectory[t]
    dt = 1000 # calculate distance every 1000 steps
    for n, step in enumerate(traj):
        if n % dt == 0:
            universe.setConfiguration(step['configuration'])
            for i in np.arange(len(chain)-1):
                for j in np.arange(len(chain)-1):
                    dist.append(universe.distance(chain[i].peptide.C_alpha,
                                                  chain[j].peptide.C_alpha))
    return(dist)

dist1 = calpha_2dmap_mult(range(1000,2000))
dist2 = calpha_2dmap_mult(range(2000,3000))

# Multiprocessing
from multiprocessing import Pool, cpu_count

pool = Pool(processes=2)
dist_pool = [pool.apply(calpha_2dmap_mult, args=(t,)) for t in [range(1000,2000), range(2000,3000)]]

print(dist_pool[0]==dist1)
print(dist_pool[1]==dist2)
如果我尝试
Pool(processs=1)
,代码会按预期工作,但只要我请求多个进程,代码就会崩溃,并出现以下错误:

python: posixio.c:286: px_pgin: Assertion `*posp == ((off_t)(-1)) || *posp == lseek(nciop->fd, 0, 1)' failed.

如果有人提出建议,我们将不胜感激;-)

我怀疑这是因为:

trajectory = Trajectory(None, filename)

您只需在开始时打开文件一次。您可能只需将文件名传递到多处理目标函数中,并在其中打开它。

如果您在OS X或任何其他类似Unix的系统上运行此代码,多处理将使用分叉来创建子进程

分叉时,文件描述符与父进程共享。据我所知,轨迹对象包含对文件描述符的引用

要解决此问题,您应该放置

轨迹=轨迹(无,文件名)


在calpha_2dmap_mult中,确保每个子进程单独打开文件。

以下是允许使用多个进程的新脚本(但没有性能改进):


计算距离所花费的时间在没有(70.1s)或多处理(70.2s)的情况下是“相同的”!我可能并不期待因子4的改善,但我至少期待一些改善

听起来通过NFS读取netCDF文件可能有问题。NFS存储上是否有
traj_prot_nojump.nc
?见和。后者建议先将文件复制到本地存储。

感谢您的评论(@John和@Wynand),我知道可以使用多个进程。。。但是性能根本没有提高!新脚本将在下一个答案中编写!诀窍是使用pool.apply\u async而不是pool.apply来获得预期的性能。见[解释]。
from MMTK import *
from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator
from MMTK.Proteins import Protein, PeptideChain
import numpy as np
import time

filename = 'traj_prot_nojump.nc'


trajectory = Trajectory(None, filename)
universe = trajectory.universe
proteins = universe.objectList(Protein)
chain = proteins[0][0]

def calpha_2dmap_mult(trajectory = trajectory, t = range(0,len(trajectory))):
    dist = []
    universe = trajectory.universe
    proteins = universe.objectList(Protein)
    chain = proteins[0][0]
    traj = trajectory[t]
    dt = 1000 # calculate distance every 1000 steps
    for n, step in enumerate(traj):
        if n % dt == 0:
            universe.setConfiguration(step['configuration'])
            for i in np.arange(len(chain)-1):
                for j in np.arange(len(chain)-1):
                    dist.append(universe.distance(chain[i].peptide.C_alpha,
                                                  chain[j].peptide.C_alpha))
    return(dist)

c0 = time.time()
dist1 = calpha_2dmap_mult(trajectory, range(0,11001))
#dist1 = calpha_2dmap_mult(trajectory, range(0,11001))
c1 = time.time() - c0
print(c1) 


# Multiprocessing
from multiprocessing import Pool, cpu_count

pool = Pool(processes=4)
c0 = time.time()
dist_pool = [pool.apply(calpha_2dmap_mult, args=(trajectory, t,)) for t in
             [range(0,2001), range(3000,5001), range(6000,8001),
              range(9000,11001)]]
c1 = time.time() - c0
print(c1)


dist1 = np.array(dist1)
dist_pool = np.array(dist_pool)
dist_pool = dist_pool.flatten()
print(np.all((dist_pool == dist1)))