我正在尝试安装Cuda11.1,运行时api和我的gpu。
我正在运行Ubuntu x86_64 18.04。我已经尝试将我的Cuda运行时升级到11.1,但一直未能做到。驱动程序已经更新,但我的运行时api没有更新。
nvidia-smi
显示我已经升级到11.0,但是
nvcc -V
显示为运行时API安装的10.0.130版本。
按照https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html的指示
我将按照指南中列出的顺序来检查这些命令。
第二节.安装前动作
lspci | grep -i nvidia导致
19:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
19:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
19:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
19:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
1a:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
1a:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
1a:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
1a:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
67:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
67:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
67:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
67:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
68:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
68:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
68:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
68:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)uname -m && cat /etc/*release导致
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionicgcc --version结果
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.uname -r的结果
5.4.0-51-genericsudo apt-get install linux-headers-$(uname -r)的结果
Reading package lists... Done
Building dependency tree
Reading state information... Done
linux-headers-5.4.0-51-generic is already the newest version (5.4.0-51.56~18.04.1).
linux-headers-5.4.0-51-generic set to manually installed.
The following packages were automatically installed and are no longer required:
dkms libaccinj64-10.0 libatomic1:i386 libboost-python1.65.1 libbsd0:i386 libc-ares2 libcublas10.0 libcudnn7 libcufft10.0 libcufftw10.0 libcuinj64-10.0 libcupti-dev libcupti-doc libcupti10.0 libcurand10.0
libcusolver10.0 libcusparse10.0 libdrm-amdgpu1:i386 libdrm-intel1:i386 libdrm-nouveau2:i386 libdrm-radeon1:i386 libdrm2:i386 libedit2:i386 libelf1:i386 libexpat1:i386 libffi6:i386 libgflags2.2 libgl1:i386
libgl1-mesa-dri:i386 libglapi-mesa:i386 libglvnd0:i386 libglx-mesa0:i386 libglx0:i386 libgoogle-glog0v5 libgrpc7 libjs-sphinxdoc libleveldb1v5 libllvm10:i386 liblmdb0 libnppc10.0 libnppial10.0 libnppicc10.0
libnppicom10.0 libnppidei10.0 libnppif10.0 libnppig10.0 libnppim10.0 libnppist10.0 libnppisu10.0 libnppitc10.0 libnpps10.0 libnvblas10.0 libnvgraph10.0 libnvidia-cfg1-450 libnvidia-common-450
libnvidia-compute-450:i386 libnvidia-decode-450 libnvidia-decode-450:i386 libnvidia-encode-450 libnvidia-encode-450:i386 libnvidia-extra-450 libnvidia-extra-450:i386 libnvidia-fbc1-450 libnvidia-fbc1-450:i386
libnvidia-gl-450 libnvidia-gl-450:i386 libnvidia-ifr1-450 libnvidia-ifr1-450:i386 libnvrtc10.0 libnvtoolsext1 libnvvm3 libpciaccess0:i386 libprotobuf18 libprotoc18 libsensors4:i386 libsleef3 libstdc++6:i386
libthrust-dev libvdpau-dev libx11-6:i386 libx11-xcb1:i386 libxau6:i386 libxcb-dri2-0:i386 libxcb-dri3-0:i386 libxcb-glx0:i386 libxcb-present0:i386 libxcb-sync1:i386 libxcb1:i386 libxdamage1:i386 libxdmcp6:i386
libxext6:i386 libxfixes3:i386 libxnvctrl0 libxshmfence1:i386 libxxf86vm1:i386 pkg-config protobuf-compiler python-absl python-astor python-cffi python-configparser python-future python-gast python-grpcio
python-leveldb python-networkx python-pasta python-ply python-protobuf python-pycparser python-pywt python-skimage python-skimage-lib python-termcolor python-typing python-wrapt python3-absl python3-astor
python3-cffi python3-future python3-gast python3-grpcio python3-leveldb python3-markdown python3-networkx python3-pasta python3-ply python3-pycparser python3-pyinotify python3-pywt python3-skimage python3-skimage-lib
python3-tensorflow-serving python3-termcolor python3-werkzeug python3-wrapt screen-resolution-extra xserver-xorg-video-nvidia-450
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 179 not upgraded.我运行了以下命令
sudo /usr/bin/nvidia-uninstall
sudo apt-get --purge remove cuda*
sudo apt-get --purge remove nvidia*
sudo apt-get --purge remove libcuda* 我试着找
sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl但是bin中没有这个名称的文件,所以我不认为前面的cuda是用runfile安装的。
我检查了nvidia-smi和nvcc -V,这两次都没有找到命令,而是在什么时候找到的。当我运行安装程序时,我一直收到一条警告消息--前面有一个安装程序,
现有的包管理器安装驱动程序。强烈建议您在继续之前删除此操作。
因此,我尝试了一些其他方法来删除cuda的安装。
sudo apt-get --purge remove cuda-11.0
sudo apt-get --purge remove cuda-11.1
sudo apt-get --purge remove cuda-10.0
sudo apt-get purge nvidia*
sudo apt-get remove --purge cuda-* libcuda* nvidia*
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt remove --autoremove nvidia-cuda-toolkit
sudo dpkg -l | grep nvidia
sudo apt purge cuda
sudo apt purge -y nvidia
sudo apt remove -y nvidia-*
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt autoremove -y && apt autoclean -y
sudo rm -rf /usr/local/cuda*第6节. Runfile安装
6.3。使新成员丧失能力
我运行了以下命令
touch /etc/modprobe.d/blacklist-nouveau.conf再加上
blacklist nouveau
options nouveau modeset=0那份文件。然后我处决了
update-initramfs: Generating /boot/initrd.img-5.4.0-52-generic这导致
update-initramfs: Generating /boot/initrd.img-5.4.0-52-generic然后我测试了lsmod | grep nouveau,看看它是否打印任何东西,但它没有。
然后我尝试了这个安装。
给出了这些命令
wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run
sudo sh cuda_11.1.0_455.23.05_linux.run我下载了安装程序并运行了sudo sh cuda_11.1.0_455.23.05_linux.run。
这导致了这样的消息
Installation failed. See log at /var/log/cuda-installer.log for details.我打开了那个文件,这是内容
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc
[INFO]: gcc version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 455.23.05
[INFO]: Executing NVIDIA-Linux-x86_64-455.23.05.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 455.23.05 failed, quitting所以看起来在驱动程序上安装失败了。我不知道是什么导致了这个错误,因为11.0以前已经安装到GPU上了。
然后我尝试通过deb安装。
给出了这些命令
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda-repo-ubuntu1804-11-1-local_11.1.0-455.23.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-1-local_11.1.0-455.23.05-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-1-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda最后一个命令似乎出现了一个错误,其余的命令似乎运行良好,没有问题。这是最后一个命令sudo apt-get -y install cuda的输出,它提供了这个输出
`Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
cuda : Depends: cuda-11-1 (>= 11.1.0) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.在尝试对驱动程序安装进行故障排除时,我发现sudo apt install nvidia-450-dev可能会工作,所以我尝试了一下,并且成功了。
nvidia-smi
显示如下:
Mon Oct 26 18:27:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:19:00.0 Off | N/A |
| 22% 31C P8 1W / 250W | 6MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:1A:00.0 Off | N/A |
| 22% 35C P8 4W / 250W | 6MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:67:00.0 Off | N/A |
| 22% 37C P8 6W / 250W | 6MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208... Off | 00000000:68:00.0 Off | N/A |
| 22% 39C P8 1W / 250W | 26MiB / 11016MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1314 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1314 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 1314 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 1314 G /usr/lib/xorg/Xorg 9MiB |
| 3 N/A N/A 1653 G /usr/bin/gnome-shell 14MiB |
+-----------------------------------------------------------------------------+但是,驱动程序是11.0,而不是11.1。
因此,我尝试安装和旧版本的cuda,11.0,而不是11.1。
这只适用于驱动程序,而不是运行时API。
运行nvcc -V会给出"bash: /usr/bin/nvcc:没有这样的文件或目录“
然后我尝试安装11.0,因为运行时API应该比驱动程序版本低或相等。
从…
https://developer.nvidia.com/cuda-11.0-download-archive
发出了以下命令,
wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run
sudo sh cuda_11.0.2_450.51.05_linux.run下载安装程序后,运行sudo sh cuda_11.0.2_450.51.05_linux.run
首先给我一个警告,一个以前的版本正在重新安装,可能是从驱动程序安装。我选择继续,因为我将只安装工具包,而不是驱动程序。我继续,并选择安装除了驱动程序以外的所有东西。
CUDA Installer │
│ - [ ] Driver │
│ [ ] 450.51.05 │
│ + [X] CUDA Toolkit 11.0 │
│ [X] CUDA Samples 11.0 │
│ [X] CUDA Demo Suite 11.0 │
│ [X] CUDA Documentation 11.0 │
│ Options │
│ Install │
│ │
│ │
│ 安装完毕后,我收到一条消息
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.0/
Samples: Installed in /home/santosh/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.0/lib64, or, add /usr/local/cuda-11.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-11.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log我将/usr/local/cuda-11.0/bin添加到PATH中,并将LD_LIBRARY_PATH设置为/usr/local/cuda-11.0/lib64 64
然后我在这里尝试了post安装说明,https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#power9-setup
systemctl status nvidia-persistenced导致"Unit Persistenced.service被找不到“。
sudo systemctl enable nvidia-persistenced导致
The unit files have no installation config (WantedBy, RequiredBy, Also, Alias
settings in the [Install] section, and DefaultInstance for template units).
This means they are not meant to be enabled using systemctl.
Possible reasons for having this kind of units are:
1) A unit may be statically enabled by being symlinked from another unit's
.wants/ or .requires/ directory.
2) A unit's purpose may be to act as a helper for some other unit which has
a requirement dependency on it.
3) A unit may be started when needed via activation (socket, path, timer,
D-Bus, udev, scripted systemctl call, ...).
4) In case of template units, the unit is meant to be enabled with some
instance name specified.我能够在没有问题的情况下执行udeve规则指令;我运行了以下命令
sudo cp /lib/udev/rules.d/40-vm-hotadd.rules /etc/udev/rules.d
sudo sed -i '/SUBSYSTEM=="memory", ACTION=="add"/d' /etc/udev/rules.d/40-vm-hotadd.rules我试过nvcc -V只是为了检查安装是否在某种程度上起作用。这一次我收到一条消息
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit所以我尝试了这个命令,它的安装似乎没有问题。当我再次运行nvcc -V时,我收到了一条消息
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130这是我刚开始讲的CUDA的版本。
看这条消息
https://forums.developer.nvidia.com/t/cuda-10-installation-problems-on-ubuntu-18-04/68615
按照linux安装指南中的说明:https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html 836从http://www.nvidia.com/getcuda 267获得安装程序,既然已经安装了错误的驱动程序,请仔细阅读linux安装指南。如果不小心遵守,就会带来更多的麻烦。
似乎不推荐在gpu和工具包上安装其他方法(使用sudo apt install nvidia-450-dev和sudo apt install nvidia-cuda-toolkit) ),应该严格遵循指南。
但是,我遵守了指令,它无法安装到驱动程序上。驱动程序的安装看起来并不是不可能的,因为替代命令某种程度上起了作用,但是错误日志并没有给我任何关于我如何能够以官方的方式安装它的见解。
发布于 2020-10-27 18:45:04
我解决了这个问题。硬件附带了自己的cuda安装文件,我不知道这些文件。一旦这些被封锁,安装就完美地工作了。
https://askubuntu.com/questions/1287290
复制相似问题