Parallel processing 我的工作被杀了

Parallel processing 我的工作被杀了,parallel-processing,cluster-computing,Parallel Processing,Cluster Computing,我试图在SGE上运行一个任务,但它一直被杀死。我不确定应该在脚本中更改哪个参数 我的submit.sh脚本: =========== #$ -l mem_free=32G #$ -l h_rt=48:00:00 ## softx will require 8 processors softx myprogram.sh qname long.q hostname node02.local department defaultdepartment jobname

我试图在SGE上运行一个任务,但它一直被杀死。我不确定应该在脚本中更改哪个参数

我的submit.sh脚本:

===========

#$ -l mem_free=32G
#$ -l h_rt=48:00:00

## softx will require 8 processors
softx myprogram.sh
qname        long.q
hostname     node02.local
department   defaultdepartment
jobname      submit.sh
jobnumber    740
taskid       undefined
account      sge
priority     0

granted_pe   NONE
slots        1
failed       37  : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status  137                  (Killed)
ru_wallclock 1588s
ru_utime     0.110s
ru_stime     0.190s
ru_maxrss    5.520KB
ru_ixrss     0.000B
ru_ismrss    0.000B
ru_idrss     0.000B
ru_isrss     0.000B
ru_minflt    25267
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   176
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     351
ru_nivcsw    95
cpu          10096.930s
mem          429.730GBs
io           76.911GB
iow          0.000s
maxvmem      8.635GB
arid         undefined
ar_sub_time  undefined
ar_sub_time  undefined

category     -q long.q -l h_rt=172800,mem_free=32G
==========

#$ -l mem_free=32G
#$ -l h_rt=48:00:00

## softx will require 8 processors
softx myprogram.sh
qname        long.q
hostname     node02.local
department   defaultdepartment
jobname      submit.sh
jobnumber    740
taskid       undefined
account      sge
priority     0

granted_pe   NONE
slots        1
failed       37  : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status  137                  (Killed)
ru_wallclock 1588s
ru_utime     0.110s
ru_stime     0.190s
ru_maxrss    5.520KB
ru_ixrss     0.000B
ru_ismrss    0.000B
ru_idrss     0.000B
ru_isrss     0.000B
ru_minflt    25267
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   176
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     351
ru_nivcsw    95
cpu          10096.930s
mem          429.730GBs
io           76.911GB
iow          0.000s
maxvmem      8.635GB
arid         undefined
ar_sub_time  undefined
ar_sub_time  undefined

category     -q long.q -l h_rt=172800,mem_free=32G
我将其提交给SGE:

qsub -q long.q submit.sh
我应该换什么

终止作业和队列默认值的详细信息如下所示

qacct -j 740
==============================================================

#$ -l mem_free=32G
#$ -l h_rt=48:00:00

## softx will require 8 processors
softx myprogram.sh
qname        long.q
hostname     node02.local
department   defaultdepartment
jobname      submit.sh
jobnumber    740
taskid       undefined
account      sge
priority     0

granted_pe   NONE
slots        1
failed       37  : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status  137                  (Killed)
ru_wallclock 1588s
ru_utime     0.110s
ru_stime     0.190s
ru_maxrss    5.520KB
ru_ixrss     0.000B
ru_ismrss    0.000B
ru_idrss     0.000B
ru_isrss     0.000B
ru_minflt    25267
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   176
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     351
ru_nivcsw    95
cpu          10096.930s
mem          429.730GBs
io           76.911GB
iow          0.000s
maxvmem      8.635GB
arid         undefined
ar_sub_time  undefined
ar_sub_time  undefined

category     -q long.q -l h_rt=172800,mem_free=32G
=====


队列的h_vmem限制为8g,这在作业上强制执行,而不管它们请求什么。因为作业在半小时后被终止,所以问题不应该是h_rt限制。作业报告的最大值超过队列限制。您需要与集群管理员讨论如何提交此类作业或改变问题,使其使用更少的虚拟内存

您是否尝试过在未指定h\u rt的情况下提交作业?