首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Azure-ML部署没有看到AzureML环境(版本号错误)

Azure-ML部署没有看到AzureML环境(版本号错误)
EN

Stack Overflow用户
提问于 2020-08-17 21:30:27
回答 1查看 1.9K关注 0票数 3

我按照概述的这里很好地跟踪了这些文档。

我已经按照以下方式设置了我的蔚蓝机器学习环境:

代码语言:javascript
复制
from azureml.core import Workspace

# Connect to the workspace
ws = Workspace.from_config()

from azureml.core import Environment
from azureml.core import ContainerRegistry

myenv = Environment(name = "myenv")

myenv.inferencing_stack_version = "latest"  # This will install the inference specific apt packages.

# Docker
myenv.docker.enabled = True
myenv.docker.base_image_registry.address = "myazureregistry.azurecr.io"
myenv.docker.base_image_registry.username = "myusername"
myenv.docker.base_image_registry.password = "mypassword"
myenv.docker.base_image = "4fb3..." 
myenv.docker.arguments = None

# Environment variable (I need python to look at folders 
myenv.environment_variables = {"PYTHONPATH":"/root"}

# python
myenv.python.user_managed_dependencies = True
myenv.python.interpreter_path = "/opt/miniconda/envs/myenv/bin/python" 

from azureml.core.conda_dependencies import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package("azureml-defaults")
myenv.python.conda_dependencies=conda_dep

myenv.register(workspace=ws) # works!

我有一个用于推理的score.py文件(与我遇到的问题无关).

然后设置推断配置。

代码语言:javascript
复制
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

我设置了我的计算集群:

代码语言:javascript
复制
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException

# Choose a name for your cluster
aks_name = "theclustername" 

# Check to see if the cluster already exists
try:
    aks_target = ComputeTarget(workspace=ws, name=aks_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6_Promo")

    aks_target = ComputeTarget.create(workspace=ws, name=aks_name, provisioning_configuration=prov_config)

    aks_target.wait_for_completion(show_output=True)

from azureml.core.webservice import AksWebservice

# Example
gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
                                                    num_replicas=3,
                                                    cpu_cores=4,
                                                    memory_gb=10)

一切都成功了;然后我尝试部署该模型以进行推理:

代码语言:javascript
复制
from azureml.core.model import Model

model = Model(ws, name="thenameofmymodel")

# Name of the web service that is deployed
aks_service_name = 'tryingtodeply'

# Deploy the model
aks_service = Model.deploy(ws,
                           aks_service_name,
                           models=[model],
                           inference_config=inference_config,
                           deployment_config=gpu_aks_config,
                           deployment_target=aks_target,
                           overwrite=True)

aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

它没有说它找不到环境。更具体地说,我的环境版本是version 11,但是它一直试图找到一个版本号比当前环境高1的环境(即版本12):

代码语言:javascript
复制
FailedERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 0f03a025-3407-4dc1-9922-a53cc27267d4
More information can be found here: 
Error:
{
  "code": "BadRequest",
  "statusCode": 400,
  "message": "The request is invalid",
  "details": [
    {
      "code": "EnvironmentDetailsFetchFailedUserError",
      "message": "Failed to fetch details for Environment with Name: myenv Version: 12."
    }
  ]
}

我尝试手动编辑JSON环境,以匹配azureml试图获取的版本,但是没有任何效果。有人能看出这个代码有什么问题吗?

更新

更改环境名称(例如,my_inference_env)并将其传递给InferenceConfig似乎是正确的。但是,该错误现在更改为

代码语言:javascript
复制
Running..........
Failed
ERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: f0dfc13b-6fb6-494b-91a7-de42b9384692
More information can be found here: https://some_long_http_address_that_leads_to_nothing
Error:
{
  "code": "DeploymentFailed",
  "statusCode": 404,
  "message": "Deployment not found"
}

解决方案

下面来自Anders的答案是确实正确地使用了蓝色ML环境的。但是,我得到的最后一个错误是因为我使用摘要值(一个sha)来设置容器映像,而不是使用图像名称和标记(例如imagename:tag)。注意第一个块中的代码行:

代码语言:javascript
复制
myenv.docker.base_image = "4fb3..." 

我引用摘要值,但应该更改为

代码语言:javascript
复制
myenv.docker.base_image = "imagename:tag"

完成此更改后,部署就成功了!)

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-08-17 22:08:41

我花了一段时间才得到的一个概念是注册和使用Azure Environment的分叉。如果您已经注册了您的env,myenv,并且您的环境的任何细节都没有改变,那么就没有必要向myenv.register()重新注册它。您只需使用Environment.get()获得已经注册的env,如下所示:

代码语言:javascript
复制
myenv = Environment.get(ws, name='myenv', version=11)

我的建议是给您的环境命名一些新的东西:比如"model_scoring_env"。注册一次,然后将其传递给InferenceConfig

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63458904

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档