首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Ubuntu16.04Slurmsrun与英特尔mpi失败?

Ubuntu16.04Slurmsrun与英特尔mpi失败?
EN

Ask Ubuntu用户
提问于 2017-01-15 06:24:50
回答 1查看 960关注 0票数 0

我正在尝试在运行ubuntu16.04的集群中安装slurm。

我正在使用英特尔mpi,安装目录位于head节点/opt/ intel /impi_5.01。

根据slurm指令,它需要导出libpmi.so变量。https://slurm.schedmd.com/mpi_guide.html#intel_mpi

但是,我通过ubuntu安装了slurm-llnl

代码语言:javascript
复制
sudo apt-get slurm-llnl

我不知道libpmi.so在哪里?所以,我搜索了一下,在这里找到了一个文件,这是我要找的文件吗?

代码语言:javascript
复制
/usr/lib/x86_64-linux-gnu/libpmi.so

总之,我导出了变量,并尝试了

代码语言:javascript
复制
srun -p old -N3 -n24 hostname

它回来了,

代码语言:javascript
复制
rolly@head:~$ srun -p old -N3 -n24 hostname
node02
node02
node02
node02
node02
node02
node02
node02
node01
node01
head
head
node01
head
head
head
node01
node01
head
node01
head
head
node01
node01

看上去起作用了。

但在我执行任务的时候,

代码语言:javascript
复制
srun -p old -N3 -n24 ~/QE530-CPU/espresso-5.3.0/bin/pw.x

它产生了错误,

代码语言:javascript
复制
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)

我认为错误提示是由于使用intel-mpi运行mpiexec,所以应该使用mpirun。

我怎样才能纠正这个问题?

谢谢!

EN

回答 1

Ask Ubuntu用户

发布于 2017-01-15 15:41:27

我找到了解决办法。

1) sudo apt-get install mpich

2) srun --mpi=pmi2

3)正确加载mkl和英特尔相关环境变量。

我希望这能帮助有类似问题的人。

票数 0
EN
页面原文内容由Ask Ubuntu提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://askubuntu.com/questions/872091

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档