我目前正在我的项目中使用slurm,并试图运行一个非常简单的hello world作业。我希望将我的stdout和错误重定向到特定位置的特定文件。因此,我使用了以下命令:sudo su -c 'sbatch /home/slurm/job.script --error=/home/slurm/job%j.out --output=/home/slurm/job%j.out' slurm。但我完全被忽视了。他只是尝试(并且失败了,因为他没有权限)来创建一个发出命令的文件。我用的是Debian 10流浪者盒子。我的slurm版本是slurm-wlm 18.08.5-2 (来自sinfo -V的输出)
slurm作业文件:
#!/bin/sh
#SBATCH --time=1
srun -l /bin/hostname
srun -l /bin/pwd
srun -l echo "hello world"slurm文件:
ClusterName=slurm_cluster # By default ClusterName=linux
ControlMachine=Kitsune
ControlAddr=172.16.0.20
#
SlurmUser=slurm
SlurmdUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurm/slurmd.pid
ProctrackType=proctrack/pgid
ReturnToService=0
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
DebugFlags=NO_CONF_HASH
# LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
JobCompType=jobcomp/none
#
# COMPUTE NODES
NodeName=worker1 NodeAddr=172.16.0.21 Port=6818 Procs=1 State=UNKNOWN
#NodeName=worker2 NodeAddr=172.16.0.22 Port=6818 Procs=1 State=UNKNOWN
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP发布于 2021-10-29 13:26:38
小心这篇文章
sbatch /home/slurm/job.script --error=/home/slurm/job%j.out --output=/home/slurm/job%j.out假设--error和--output是job.script的参数。试一试
sbatch --error=/home/slurm/job%j.out --output=/home/slurm/job%j.out /home/slurm/job.script https://stackoverflow.com/questions/69768886
复制相似问题