文章/答案/技术大牛

发布

社区首页 >问答首页 >如何将Azure ML上的GPU与NVIDIA CUDA自定义扩展基映像一起使用？

问如何将Azure ML上的GPU与NVIDIA CUDA自定义扩展基映像一起使用？
EN

Stack Overflow用户

提问于 2019-10-01 17:31:41

回答 1查看 1.1K关注 0票数 0

在我的dockerfile中，为了构建自定义docker基础镜像，我指定了以下基础镜像：

FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04

对应于nvidia-cuda基础镜像的dockerfile可以在这里找到：https://gitlab.com/nvidia/container-images/cuda/blob/master/dist/ubuntu16.04/10.1/devel/cudnn7/Dockerfile

现在，当我打印AzureML日志时：

run = Run.get_context()
# setting device on GPU if available, else CPU
run.log("Using device: ", torch.device('cuda' if torch.cuda.is_available() else 'cpu'))

我得到了

device(type='cpu')

但我想要一个GPU，而不是CPU。我做错了什么？

编辑:我不知道你到底需要什么。但是我可以给你以下信息: azureml.core版本是1.0.57。compute_target通过以下方式定义：

def compute_target(ws, cluster_name):
    try:
        cluster = ComputeTarget(workspace=ws, name=cluster_name)
    except ComputeTargetException:
        compute_config=AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',min_nodes=0,max_nodes=4)
        cluster = ComputeTarget.create(ws, cluster_name, compute_config)

该实验通过以下方式运行：

    ws = workspace(os.path.join("azure_cloud", 'config.json'))
    exp = experiment(ws, name=<name>)
    c_target = compute_target(ws, <name>)
    est = Estimator(source_directory='.',
                   script_params=script_params,
                   compute_target=c_target,
                   entry_script='azure_cloud/azure_training_wrapper.py',
                   custom_docker_image=image_name,
                   image_registry_details=img_reg_details,
                   user_managed = True,
                   environment_variables = {"SYSTEM": "azure_cloud"})

    # run the experiment / train the model
    run = exp.submit(config=est)

yaml文件包含：

dependencies:
  - conda-package-handling=1.3.10
  - python=3.6.2
  - cython=0.29.10
  - scikit-learn==0.21.2
  - anaconda::cloudpickle==1.2.1
  - anaconda::cffi==1.12.3
  - anaconda::mxnet=1.5.0
  - anaconda::psutil==5.6.3
  - anaconda::pycosat==0.6.3
  - anaconda::pip==19.1.1
  - anaconda::six==1.12.0
  - anaconda::mkl==2019.4
  - anaconda::cudatoolkit==10.1.168
  - conda-forge::pycparser==2.19
  - conda-forge::openmpi=3.1.2
  - pytorch::pytorch==1.2.0
  - tensorboard==1.13.1
  - tensorflow==1.13.1
  - tensorflow-estimator==1.13.0
  - pip:
      - pytorch-transformers==1.2.0
      - azure-cli==2.0.72
      - azure-storage-nspkg==3.1.0
      - azureml-sdk==1.0.57
      - pandas==0.24.2
      - tqdm==4.32.1
      - numpy==1.16.4
      - matplotlib==3.1.0
      - requests==2.22.0
      - setuptools==41.0.1
      - ipython==7.8.0
      - boto3==1.9.220
      - botocore==1.12.220
      - cntk==2.7
      - ftfy==5.6
      - gensim==3.8.0
      - horovod==0.16.4
      - keras==2.2.5
      - langdetect==1.0.7
      - langid==1.1.6
      - nltk==3.4.5
      - ptvsd==4.3.2
      - pytest==5.1.2
      - regex==2019.08.19
      - scipy==1.3.1
      - scikit_learn==0.21.3
      - spacy==2.1.8
      - tensorpack==0.9.8

编辑2:我尝试了use_gpu = True和升级到azureml-sdk=1.0.65，但都无济于事。有些人建议通过apt-get install cuda-drivers额外安装cuda-driver，但这不起作用，我也不能用它来构建docker镜像。docker镜像上的nvcc --version输出会产生以下结果：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

所以我认为这应该是可以的。docker镜像本身当然没有GPU，因此找不到命令nvidia-smi，

python -i

然后

import torch
print(torch.cuda.is_available())

将打印False。

docker

gpu

azure-machine-learning-service

回答 1

Stack Overflow用户

发布于 2019-10-02 11:44:11

在估计器定义中，请尝试添加use_gpu=True

est = Estimator(source_directory='.',
               script_params=script_params,
               compute_target=c_target,
               entry_script='azure_cloud/azure_training_wrapper.py',
               custom_docker_image=image_name,
               image_registry_details=img_reg_details,
               user_managed = True,
               environment_variables = {"SYSTEM": "azure_cloud"},
               use_gpu=True)

我相信，如果azureml-sdk>=1.0.60，这应该从使用的vm-size中推断出来，但由于您使用的是1.0.57，所以我认为这仍然是必需的。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58181917

复制

相似问题

问如何将Azure ML上的GPU与NVIDIA CUDA自定义扩展基映像一起使用？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将Azure ML上的GPU与NVIDIA CUDA自定义扩展基映像一起使用？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将Azure ML上的GPU与NVIDIA CUDA自定义扩展基映像一起使用？
EN