Mpi 只需提交一个批处理作业即可启动ipcluster_Mpi_Slurm_Ipython Parallel

Mpi 只需提交一个批处理作业即可启动ipcluster

mpi

Mpi 只需提交一个批处理作业即可启动ipcluster,mpi,slurm,ipython-parallel,Mpi,Slurm,Ipython Parallel,可以访问使用MPI和SLURM的大型计算集群。我成功地使用Slurm作业提交系统运行了并行IPythonipcluster 例如，我可以从 ipcluster start -n 320 --profile=slurm 它成功地提交了两个作业来启动控制器和引擎 2016-07-12 11:41:37.055 [IPClusterStart] Starting ipcluster with [daemon=False] 2016-07-12 11:41:37.057 [IPClusterStar

可以访问使用MPI和SLURM的大型计算集群。我成功地使用Slurm作业提交系统运行了并行IPython

ipcluster

例如，我可以从

ipcluster start -n 320  --profile=slurm

它成功地提交了两个作业来启动控制器和引擎

2016-07-12 11:41:37.055 [IPClusterStart] Starting ipcluster with [daemon=False]
2016-07-12 11:41:37.057 [IPClusterStart] Creating pid file: /draco/u/USERNAME/.ipython/profile_slurm/pid/ipcluster.pid
2016-07-12 11:41:37.057 [IPClusterStart] Starting Controller with SlurmControllerLauncher
2016-07-12 11:41:37.079 [IPClusterStart] Job submitted with job id: u'9908'
2016-07-12 11:41:38.080 [IPClusterStart] Starting 320 Engines with SlurmEngineSetLauncher
2016-07-12 11:41:38.103 [IPClusterStart] Job submitted with job id: u'9909'
2016-07-12 11:42:08.129 [IPClusterStart] Engines appear to have started successfully

我得到的第一个问题是
引擎似乎已成功启动
，这是不正确的，因为在许多情况下，启动引擎的作业必须在队列中等待一段时间才能运行，因为它需要更多的资源
这就引出了我的实际问题：如果我请求2小时，那么控制器的单核作业将立即启动，但启动引擎的作业将在队列中等待，比如1小时，然后在引擎上计算1小时后，引擎保持活动，但控制器被杀死
有没有一种方法可以让所有这些都发生在一个作业中，其中一个进程是控制器，其他进程是引擎？这样，他们几乎可以同时开始
我知道我可以要求更多的时间做控制器的工作，但这对我来说似乎不是一个干净的解决方案
编辑：
我只是无意中发现，有一个解决方案可以将这两个脚本都放在一个提交脚本中，但仍然可以通过
ipcluster
来完成，不知何故告诉它只运行这个脚本。这有点开销，但是语法总是正确的就好了

ipcluster start -n N --profile=whatever