首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用带有mpi4py的主机文件生成进程

使用带有mpi4py的主机文件生成进程
EN

Stack Overflow用户
提问于 2014-12-17 23:46:38
回答 1查看 2.5K关注 0票数 1

我试图使用MPI4py和OpenMPI在多个主机上生成一组工作进程,但是spawn命令似乎忽略了我的主机文件。我已经发布了我的全面测试,但以下是关键部分:

基于论坛讨论,我的经理脚本调用生成了hostfile选项:

代码语言:javascript
复制
mpi_info = MPI.Info.Create()
mpi_info.Set("hostfile", "worker_hosts")

comm = MPI.COMM_SELF.Spawn(sys.executable,
                           args=['testworker.py'],
                           maxprocs=args.worker_count,
                           info=mpi_info).Merge()

worker_hosts文件中,我列出了我的Scyld Beowulf集群中的节点:

代码语言:javascript
复制
myhead1 slots=2
mycompute1 slots=2
mycompute2 slots=2
mycompute3 slots=2
mycompute4 slots=3

经理和工人都打电话给MPI.Get_processor_name(),但他们都报告"myhead1“。如果我在mpirun中使用相同的主机文件,它可以工作:

代码语言:javascript
复制
> mpirun -hostfile worker_hosts -np 3 python -c "from mpi4py import MPI; print MPI.Get_processor_name()"
myhead1
myhead1
mycompute1

如果我将主机文件的名称更改为不存在的内容(如bogus_file ),则会得到一个错误:

代码语言:javascript
复制
--------------------------------------------------------------------------
Open RTE was unable to open the hostfile:
    bogus_file
Check to make sure the path and filename are correct.
--------------------------------------------------------------------------
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file base/rmaps_base_support_fns.c at line 83
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file rmaps_rr.c at line 82
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file base/rmaps_base_map_job.c at line 88
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 105
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file plm_rsh_module.c at line 1173

因此,OpenMPI注意到了hostfile选项,只是它似乎没有使用它。hostfile选项列在OpenMPI文档中。

代码语言:javascript
复制
Key                   Type      Description
---                   ----      -----------
host                  char *    Host on which the process should be spawned.
                                See the orte_host man page for an
                                explanation of how this will be used.
hostfile              char *    Hostfile containing the hosts on which
                                the processes are to be spawned. See
                                the orte_hostfile man page for an
                                explanation of how this will be used.

如何为派生请求指定主机文件?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-12-17 23:46:38

我找到了一个最新版本的OpenMPI文档,它给了我一个神奇的选择:

代码语言:javascript
复制
Key                    Type     Description
---                    ----     -----------
host                   char *   Host on which the process should be
                                spawned.  See the orte_host man
                                page for an explanation of how this
                                will be used.
hostfile               char *   Hostfile containing the hosts on which
                                the processes are to be spawned. See
                                the orte_hostfile man page for
                                an explanation of how this will be
                                used.
add-host               char *   Add the specified host to the list of
                                hosts known to this job and use it for
                                the associated process. This will be
                                used similarly to the -host option.
add-hostfile           char *   Hostfile containing hosts to be added
                                to the list of hosts known to this job
                                and use it for the associated
                                process. This will be used similarly
                                to the -hostfile option.

如果我改为使用add-hostfile,它将完美地工作:

代码语言:javascript
复制
mpi_info.Set("add-hostfile", "worker_hosts")

如果您无法使用旧版本的OpenMPI,请尝试使用mpirun和同一个主机文件运行管理器脚本。当我仍然使用hostfile选项时,这种方法也起了作用。

代码语言:javascript
复制
mpirun -hostfile worker_hosts -np1 python testmanager.py
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/27536905

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档