首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >玩具程序使用OpenMPI 1.6失败,但与Mvapich2一起工作

玩具程序使用OpenMPI 1.6失败,但与Mvapich2一起工作
EN

Stack Overflow用户
提问于 2015-12-01 20:38:08
回答 1查看 2K关注 0票数 4

我想弄清楚为什么我的OpenMPI 1.6版本不能工作。我在CentOS 6.6上使用gcc-4.7.2 .给出一个玩具程序(即hello.c)

代码语言:javascript
复制
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char * argv[])
{
    int taskID = -1; 
    int NTasks = -1; 

    /* MPI Initializations */
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &taskID);
    MPI_Comm_size(MPI_COMM_WORLD, &NTasks);

    printf("Hello World from Task %i\n", taskID);

    MPI_Finalize();
    return 0;
}

在使用mpicc hello.c编译并运行mpirun -np 8 ./a.out时,我得到了以下错误:

代码语言:javascript
复制
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            qmaster02.cluster
  Device name:           mlx4_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4103

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
Hello World from Task 4
Hello World from Task 7
Hello World from Task 5
Hello World from Task 0
Hello World from Task 2
Hello World from Task 3
Hello World from Task 6
Hello World from Task 1
[headnode.cluster:22557] 7 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[headnode.cluster:22557] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

如果我使用mspevich2-2.1和gcc-4.7.2运行这个程序,我只会得到没有任何这些错误/警告的Hello World from Task N

查看链接到a.out的库,我得到:

代码语言:javascript
复制
$ ldd a.out 
    linux-vdso.so.1 =>  (0x00007fff05ad2000)
    libmpi.so.1 => /act/openmpi-1.6/gcc-4.7.2/lib/libmpi.so.1 (0x00002b0f8e196000)
    libdl.so.2 => /lib64/libdl.so.2 (0x0000003954800000)
    libm.so.6 => /lib64/libm.so.6 (0x0000003955400000)
    librt.so.1 => /lib64/librt.so.1 (0x0000003955c00000)
    libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003965000000)
    libutil.so.1 => /lib64/libutil.so.1 (0x0000003964c00000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003955000000)
    libc.so.6 => /lib64/libc.so.6 (0x0000003954c00000)
    /lib64/ld-linux-x86-64.so.2 (0x0000003954400000)

如果我用mvapich2重新编译它,

代码语言:javascript
复制
$ ldd a.out
linux-vdso.so.1 =>  (0x00007fffcdbcb000)
libmpi.so.12 => /act/mvapich2-2.1/gcc-4.7.2/lib/libmpi.so.12 (0x00002af3be445000)
libc.so.6 => /lib64/libc.so.6 (0x0000003954c00000)
libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x000000395e800000)
libibmad.so.5 => /usr/lib64/libibmad.so.5 (0x0000003955400000)
librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x0000003146400000)
libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x0000003955800000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003956000000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003954800000)
librt.so.1 => /lib64/librt.so.1 (0x0000003955c00000)
libgfortran.so.3 => /act/gcc-4.7.2/lib64/libgfortran.so.3 (0x00002af3beaf6000)
libm.so.6 => /lib64/libm.so.6 (0x00002af3bee0a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003955000000)
libgcc_s.so.1 => /act/gcc-4.7.2/lib64/libgcc_s.so.1 (0x00002af3bf08e000)
libquadmath.so.0 => /act/gcc-4.7.2/lib64/libquadmath.so.0 (0x00002af3bf2a4000)
/lib64/ld-linux-x86-64.so.2 (0x0000003954400000)
libz.so.1 => /lib64/libz.so.1 (0x00002af3bf4d9000)
libnl.so.1 => /lib64/libnl.so.1 (0x0000003958800000)

这里怎么了?这是因为在openmpi情况下infiniband库没有链接吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-12-01 23:20:42

OpenMPI1.6不附带Mellanox ConnectX HCA的设备参数,默认情况下其ID为4103,可以很容易地修复。找到$PREFIX/share/openmpi/mca-btl-openib-device-params.ini中的$PREFIX/share/openmpi/mca-btl-openib-device-params.ini部分,并将4103追加到部件ID列表的末尾:

代码语言:javascript
复制
[Mellanox Hermon]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 25408,25418,25428,...<skipped>...,26488,4099,4103
use_eager_rdma = 1                                           ^^^^^
mtu = 2048
max_inline_data = 128

$PREFIX替换为打开MPI安装的路径。在您的情况下,这将是/act/openmpi-1.6/gcc-4.7.2

票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34029657

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档