首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >MellanoxConnectX-3HCA中OpenSUSE Leap 42.2中ib谓词的问题

MellanoxConnectX-3HCA中OpenSUSE Leap 42.2中ib谓词的问题
EN

Server Fault用户
提问于 2017-03-31 23:07:58
回答 1查看 978关注 0票数 0

解出

我在OpenSUSE存储库中设置infiniband软件时遇到了麻烦。

同样的卡可以在SLES12 + Mellanox堆栈中正常工作。

从yast安装包含“infiniband”的所有内容--我看到HCA已经启动,并且像ibnode这样的诊断工具会显示相关数据:

代码语言:javascript
复制
>ibnodes
Ca      : 0x0002c90300a00360 ports 2 "cnode1 HCA-1"
Ca      : 0x0002c90300ea8fd0 ports 1 "helper1 mlx4_0"
Switch  : 0x0008f1050020096c ports 36 "Voltaire 4036 # spine2" enhanced port 0 lid 17 lmc 0

这里,helper1是一台OpenSUSE机器,cnode1是一个SLES节点。

但是当谈到动词时,我得到:

代码语言:javascript
复制
>ibv_devinfo
No IB devices found

因此,我无法让MPI使用infiniband。

我是缺少了一些中间层,还是libib谓词组件需要一些额外的配置?

谢谢!

UPD:来自zypper和lsmod的更多输出:

下面是从Leap 42.2存储库安装的包:

代码语言:javascript
复制
>zypper se verbs

S | Name                    | Summary                                                     | Type
--+-------------------------+-------------------------------------------------------------+--------
i | libibverbs-devel        | Development files for the libibverbs library                | package
  | libibverbs-devel-32bit  | Development files for the libibverbs library                | package
i | libibverbs-devel-static | Static libibverbs library                                   | package
i | libibverbs-runtime      | Tools for the Infiniband Verbs library and manpages         | package
i | libibverbs1             | Infiniband verbs library                                    | package
  | libibverbs1-32bit       | Infiniband verbs library                                    | package
i | libipathverbs-rdmav2    | PathScale InfiniPath HCA Userspace Driver                   | package
i | libusnic_verbs-rdmav2   | Cisco UCS InfiniBand HCA Userspace Driver                   | package
  | texlive-newverbs        | Define new versions of \verb, including short verb versions | package
  | texlive-newverbs-doc    | Documentation for texlive-newverbs                          | package

OpenSUSE的加载模块列表(ibv_*诊断无法找到HCA)

代码语言:javascript
复制
>lsmod | grep ib
ib_ucm                 24576  0
ib_ipoib               98304  0
ib_cm                  49152  3 rdma_cm,ib_ucm,ib_ipoib
ib_uverbs              61440  2 ib_ucm,rdma_ucm
ib_umad                24576  0
iscsi_ibft             16384  0
iscsi_boot_sysfs       20480  1 iscsi_ibft
mlx4_ib               167936  0
ib_sa                  40960  5 rdma_cm,ib_cm,mlx4_ib,rdma_ucm,ib_ipoib
ib_mad                 57344  4 ib_cm,ib_sa,mlx4_ib,ib_umad
ib_core               131072  10 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,ib_mad,ib_ucm,ib_umad,ib_uverbs,ib_ipoib
ib_addr                20480  4 rdma_cm,ib_sa,ib_core,rdma_ucm
mlx4_core             323584  1 mlx4_ib
libahci                36864  1 ahci
libata                270336  2 ahci,libahci
scsi_mod              262144  4 sg,libata,sd_mod,sr_mod    
libcrc32c              16384  1 xfs
snd_usbmidi_lib        36864  1 snd_usb_audio
snd_rawmidi            36864  1 snd_usbmidi_lib
snd                    90112  12 snd_hda_codec_realtek,snd_usb_audio,snd_hwdep,snd_timer,snd_hda_codec_hdmi,snd_pcm,snd_rawmidi,snd_hda_codec_generic,snd_usbmidi_lib,snd_hda_codec,snd_hda_intel,snd_seq_device
usbcore               270336  6 snd_usb_audio,uvcvideo,snd_usbmidi_lib,ehci_hcd,ehci_pci,usbhid

SLES 12的加载模块列表(ibv_*工作)

代码语言:javascript
复制
>lsmod |grep ib
ib_ucm                 18489  0
ib_ipoib              144838  0
ib_cm                  46900  3 rdma_cm,ib_ucm,ib_ipoib
ib_uverbs              83349  2 ib_ucm,rdma_ucm
ib_umad                22281  6
mlx5_ib               204339  0
mlx5_core             572759  1 mlx5_ib
inet_lro               13400  3 mlx4_en,mlx5_core,ib_ipoib
iscsi_ibft             12862  0
iscsi_boot_sysfs       16051  1 iscsi_ibft
mlx4_ib               208061  0
ib_sa                  37997  5 rdma_cm,ib_cm,mlx4_ib,rdma_ucm,ib_ipoib
ib_mad                 60774  4 ib_cm,ib_sa,mlx4_ib,ib_umad
ib_core               159115  12 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
ib_addr                19098  3 rdma_cm,ib_core,rdma_ucm
ib_netlink             14070  3 rdma_cm,iw_cm,ib_addr
mlx4_core             374829  2 mlx4_en,mlx4_ib
mlx_compat             14630  18 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_netlink,ib_addr,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib
libahci                36105  1 ahci
libata                235807  2 ahci,libahci
scsi_mod              244354  3 sg,libata,sd_mod

此外,grep仅适用于动词:

OpenSUSE

代码语言:javascript
复制
>lsmod | grep verbs
ib_uverbs              61440  2 ib_ucm,rdma_ucm
ib_core               131072  10 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,ib_mad,ib_ucm,ib_umad,ib_uverbs,ib_ipoib

SLES

代码语言:javascript
复制
>lsmod | grep verbs
ib_uverbs              83349  2 ib_ucm,rdma_ucm
ib_core               159115  12 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
mlx_compat             14630  18 rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_netlink,ib_addr,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib

UPD2:正如德里克·米切尔所写的,我可以在SLES+Mellanox OFED中看到服务开放:

代码语言:javascript
复制
>service openibd status
openibd.service - openibd - configure Mellanox devices
   Loaded: loaded (/usr/lib/systemd/system/openibd.service; enabled)
   Active: active (exited) since Thu 2017-03-23 17:45:38 MSK; 1 weeks 2 days ago
     Docs: file:/etc/infiniband/openib.conf
 Main PID: 678 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/openibd.service

Leap 42.2中没有这样的服务,但是有rdma服务,而是:

代码语言:javascript
复制
service rdma status
* rdma.service - Initialize the iWARP/InfiniBand/RDMA stack in the kernel
   Loaded: loaded (/usr/lib/systemd/system/rdma.service; disabled; vendor preset: disabled)
   Active: active (exited) since Sat 2017-04-01 19:23:45 MSK; 2min 45s ago
     Docs: file:/etc/rdma/rdma.conf
  Process: 601 ExecStart=/usr/sbin/rdma-init-kernel (code=exited, status=0/SUCCESS)
 Main PID: 601 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 512)
   CGroup: /system.slice/rdma.service

Apr 01 19:23:37 helper1 systemd[1]: Starting Initialize the iWARP/InfiniBand/RDMA stack in the kernel...
Apr 01 19:23:45 helper1 rdma-init-kernel[601]: /sys/class/infiniband /
Apr 01 19:23:45 helper1 rdma-init-kernel[601]: /
Apr 01 19:23:45 helper1 systemd[1]: Started Initialize the iWARP/InfiniBand/RDMA stack in the kernel.

无论如何,ibv_devinfo仍然找不到connectx-3卡.

因此,这个问题是由Leap 42.2主存储库(不包括libmlx4-rdmv 2包)引起的,而mlx4是connectx-3 HCA的驱动程序。

添加工厂后的存储库

代码语言:javascript
复制
zypper addrepo http://download.opensuse.org/repositories/OFED:Factory/openSUSE_Leap_42.2/OFED:Factory.repo

安装libmlx4-rdmv 2并将所有其他infiniband包降级为Factory版本,我开始使用ibv_devinfo。

代码语言:javascript
复制
>ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.36.5000
        node_guid:                      0002:c903:00ed:3ed0
        sys_image_guid:                 0002:c903:00ed:3ed3
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        board_id:                       MT_1100120019
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 3
                        port_lid:               19
                        port_lmc:               0x00
                        link_layer:             InfiniBand
EN

回答 1

Server Fault用户

发布于 2017-04-01 14:57:38

我这样做是为了我的扑克牌VOLTAIRE 410-4 4EX(麦加)在leap 42.1上:

代码语言:javascript
复制
zypper install opensm ibutils ibutils-devel infiniband-diags infiniband-diags-devel libibcm1 libibverbs-devel libibverbs-runtime ibacm libibcm1 libmthca-rdmav2 rdma tvflash libibnetdisc5 ibsim qperf

然后:

代码语言:javascript
复制
systemctl enable openibd

systemctl start  openibd
票数 -1
EN
页面原文内容由Server Fault提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://serverfault.com/questions/841904

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档