首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Torque PBS作业进入调试队列

Torque PBS作业进入调试队列
EN

Stack Overflow用户
提问于 2016-04-09 07:04:41
回答 1查看 212关注 0票数 0

在我的新工作中,我管理一个集群,该集群使用torque作为资源管理器,使用maui作为调度器。

目前,我正面临着这样一个反复出现的问题,即特定用户的作业总是被发送到调试队列。以下是系统上活动队列的列表:

代码语言:javascript
复制
Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
debug              --      --    00:20:00   --    0   0 12   E R
intel              --      --       --      --    0   0 --   E R
medium             --      --    72:00:00   --    0   0 12   E R
bighuge            --      --       --      --    0   0 --   E R
long               --      --       --      --    0   0 12   E R
                                               ----- -----
                                                   0     0

用户提交的作业的Wall-time是以小时为单位的,所以我不明白为什么它会被发送到调试队列。

此外,下面是tracejob的输出:

代码语言:javascript
复制
04/08/2016 15:46:48  S    enqueuing into intel, state 1 hop 1
04/08/2016 15:46:48  S    dequeuing from intel, state QUEUED
04/08/2016 15:46:48  S    enqueuing into debug, state 1 hop 1
04/08/2016 15:46:48  S    Job Queued at request of dawn@cm01, owner = dawn@cm01, job name = run01_submit.script, queue =
                          debug
04/08/2016 15:46:49  S    Job Run at request of root@cm01
04/08/2016 15:46:49  S    child reported success for job after 0 seconds (dest=n20), rc=0
04/08/2016 15:46:49  S    preparing to send 'b' mail for job 15631.cm01 to dawn@cm01 (---)
04/08/2016 15:46:49  S    Not sending email: User does not want mail of this type.
04/08/2016 15:46:49  S    obit received - updating final job usage info
04/08/2016 15:46:49  S    job exit status 1 handled
04/08/2016 15:46:49  S    preparing to send 'e' mail for job 15631.cm01 to dawn@cm01 (Exit_status=1
04/08/2016 15:46:49  S    Not sending email: User does not want mail of this type.
04/08/2016 15:46:49  S    Exit_status=1 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb
                          resources_used.walltime=00:00:00
04/08/2016 15:46:49  S    on_job_exit task assigned to job
04/08/2016 15:46:49  S    req_jobobit completed
04/08/2016 15:46:49  S    JOB_SUBSTATE_EXITING
04/08/2016 15:46:49  S    JOB_SUBSTATE_STAGEOUT
04/08/2016 15:46:49  S    about to copy stdout/stderr/stageout files
04/08/2016 15:46:49  S    JOB_SUBSTATE_STAGEOUT
04/08/2016 15:46:49  S    JOB_SUBSTATE_STAGEDEL
04/08/2016 15:46:49  S    JOB_SUBSTATE_EXITED
04/08/2016 15:46:49  S    JOB_SUBSTATE_COMPLETE
04/08/2016 15:50:54  S    Request invalid for state of job COMPLETE
04/08/2016 15:51:00  S    Request invalid for state of job COMPLETE
04/08/2016 15:51:49  S    dequeuing from debug, state COMPLETE

现在的解决方法是使用qalter命令手动更改为作业分配的队列。

有什么想法吗?

EN

回答 1

Stack Overflow用户

发布于 2016-05-21 10:10:54

因为作业会立即从英特尔队列跳转到调试,所以我怀疑您已经在qmgr或Maui中配置了自动路由。如果英特尔队列被配置为路由队列,这就可以解释了。

运行qmgr -c "print queue intel"进行检查。

如果它不是路由队列,您可能可以增加loglevel,以便更好地查看pbs_server日志中发生的情况。

当我以这种方式创建路由队列时,我会在提交作业时获得相同类型的跟踪作业输出:

05/20/2016 20:04:05.439 S enqueuing into route, state 1 hop 1 05/20/2016 20:04:05.440 S dequeuing from route, state QUEUED 05/20/2016 20:04:05.440 S enqueuing into test, state 1 hop 1 05/20/2016 20:04:05.737 S Job Run at request of root@testserver

否则,请检查Maui配置和日志以获取线索。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/36510690

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档