假设我的作业运行了一段时间,由于机器过载而进入挂起状态,并在一段时间后开始运行并完成。现在,此作业获取的状态为RUNNING -> SUSPEND -> RUNNING
如何获取给定作业所获取的所有状态?
发布于 2017-01-03 13:09:08
如果作业尚未从系统中清除,则返回-l。
bhist -l否则。您可能需要-n,这取决于作业时间的长短。
这是一个bhist -l输出的示例,当一个作业由于系统负载暂时超过配置的阈值而被挂起并在稍后恢复时。
$ bhist -l 1168
Job <1168>, User <mclosson>, Project <default>, Command <sleep 10000>
Fri Jan 20 15:08:40: Submitted from host <hostA>, to
Queue <normal>, CWD <$HOME>, Specified Hosts <hostA>;
Fri Jan 20 15:08:41: Dispatched 1 Task(s) on Host(s) <hostA>, Allocated 1 Slot(
s) on Host(s) <hostA>, Effective RES_REQ <select[type == any] or
der[r15s:pg] >;
Fri Jan 20 15:08:41: Starting (Pid 30234);
Fri Jan 20 15:08:41: Running with execution home </home/mclosson>, Execution CW
D </home/mclosson>, Execution Pid <30234>;
Fri Jan 20 16:19:22: Suspended: Host load exceeded threshold: 1-minute CPU ru
n queue length (r1m)
Fri Jan 20 16:21:43: Running;
Summary of time in seconds spent in various states by Fri Jan 20 16:22:09
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
1 0 4267 0 141 0 4409 在16:19:22,由于r1m超过阈值,作业被挂起。晚些时候,16:21:43,工作重新开始。
https://stackoverflow.com/questions/41422581
复制相似问题