Parallel processing 我的工作被杀了
我试图在SGE上运行一个任务,但它一直被杀死。我不确定应该在脚本中更改哪个参数 我的submit.sh脚本: ===========Parallel processing 我的工作被杀了,parallel-processing,cluster-computing,Parallel Processing,Cluster Computing,我试图在SGE上运行一个任务,但它一直被杀死。我不确定应该在脚本中更改哪个参数 我的submit.sh脚本: =========== #$ -l mem_free=32G #$ -l h_rt=48:00:00 ## softx will require 8 processors softx myprogram.sh qname long.q hostname node02.local department defaultdepartment jobname
#$ -l mem_free=32G
#$ -l h_rt=48:00:00
## softx will require 8 processors
softx myprogram.sh
qname long.q
hostname node02.local
department defaultdepartment
jobname submit.sh
jobnumber 740
taskid undefined
account sge
priority 0
granted_pe NONE
slots 1
failed 37 : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status 137 (Killed)
ru_wallclock 1588s
ru_utime 0.110s
ru_stime 0.190s
ru_maxrss 5.520KB
ru_ixrss 0.000B
ru_ismrss 0.000B
ru_idrss 0.000B
ru_isrss 0.000B
ru_minflt 25267
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 176
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 351
ru_nivcsw 95
cpu 10096.930s
mem 429.730GBs
io 76.911GB
iow 0.000s
maxvmem 8.635GB
arid undefined
ar_sub_time undefined
ar_sub_time undefined
category -q long.q -l h_rt=172800,mem_free=32G
==========
#$ -l mem_free=32G
#$ -l h_rt=48:00:00
## softx will require 8 processors
softx myprogram.sh
qname long.q
hostname node02.local
department defaultdepartment
jobname submit.sh
jobnumber 740
taskid undefined
account sge
priority 0
granted_pe NONE
slots 1
failed 37 : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status 137 (Killed)
ru_wallclock 1588s
ru_utime 0.110s
ru_stime 0.190s
ru_maxrss 5.520KB
ru_ixrss 0.000B
ru_ismrss 0.000B
ru_idrss 0.000B
ru_isrss 0.000B
ru_minflt 25267
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 176
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 351
ru_nivcsw 95
cpu 10096.930s
mem 429.730GBs
io 76.911GB
iow 0.000s
maxvmem 8.635GB
arid undefined
ar_sub_time undefined
ar_sub_time undefined
category -q long.q -l h_rt=172800,mem_free=32G
我将其提交给SGE:
qsub -q long.q submit.sh
我应该换什么
终止作业和队列默认值的详细信息如下所示
qacct -j 740
==============================================================
#$ -l mem_free=32G
#$ -l h_rt=48:00:00
## softx will require 8 processors
softx myprogram.sh
qname long.q
hostname node02.local
department defaultdepartment
jobname submit.sh
jobnumber 740
taskid undefined
account sge
priority 0
granted_pe NONE
slots 1
failed 37 : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status 137 (Killed)
ru_wallclock 1588s
ru_utime 0.110s
ru_stime 0.190s
ru_maxrss 5.520KB
ru_ixrss 0.000B
ru_ismrss 0.000B
ru_idrss 0.000B
ru_isrss 0.000B
ru_minflt 25267
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 176
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 351
ru_nivcsw 95
cpu 10096.930s
mem 429.730GBs
io 76.911GB
iow 0.000s
maxvmem 8.635GB
arid undefined
ar_sub_time undefined
ar_sub_time undefined
category -q long.q -l h_rt=172800,mem_free=32G
=====
队列的h_vmem限制为8g,这在作业上强制执行,而不管它们请求什么。因为作业在半小时后被终止,所以问题不应该是h_rt限制。作业报告的最大值超过队列限制。您需要与集群管理员讨论如何提交此类作业或改变问题,使其使用更少的虚拟内存 您是否尝试过在未指定h\u rt的情况下提交作业?