文章/答案/技术大牛

发布

社区首页 >问答首页 >未使用所有vCPUs的AWS Sagemaker推理端点

问未使用所有vCPUs的AWS Sagemaker推理端点
EN

Stack Overflow用户

提问于 2021-06-22 12:48:05

回答 1查看 798关注 0票数 3

我在sagemaker推理端点(单个实例)上部署了一个自定义模型，当我进行负载测试时，我观察到CPU利用率正在达到100%的最大值，但是根据这个职位，它应该在#vCPU*100 %时达到最大值。我已经确认，推断端点并不是使用小丑监视日志中的所有核心。

因此，如果一个预测调用需要处理1秒才能给出响应，那么部署的模型只能每秒处理一个API调用，如果使用了所有的vCPU，这个调用可能会增加到每秒8个。

在AWS部署中是否有任何设置可以使用所有的vCPU来增加并发性？

或者，在部署时，我们可以在inference.py文件中使用多处理python包，这样每个调用都进入默认的核心，并且所有的计算/预测都是在任何其他内核中完成的，哪个是该实例中的空？

python

amazon-web-services

machine-learning

amazon-sagemaker

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-06-23 02:40:05

更新

设置三个环境变量

1. ENABLE\_MULTI\_MODEL as "true" (make sure it is string and not bool) and set [SAGEMAKER\_HANDLER](https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/torchserve.py#L74) as custom model handler python module path if custom service else dont define it. Also make sure model name [model.mar](https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/torchserve.py#L94), before compressing it as tar ball and storing in s3
2. TS\_DEFAULT\_WORKERS\_PER\_MODEL as number of vcpus
3. First environment variable makes sure torch serve env\_vars are enabled and second one uses first setting and loads requested number of workers
4. Setting can be done by passing env dictionary argument to [PyTorch function](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#create-an-estimator). Below is explanation as to why it works

从它的外观来看，Sagemaker指南中给出的pytorch模型的sagemaker部署使用这个文件。在这个码头，入口点是torchserve-entrypoint.py，就像在Dockerfile line#124。
这个torchserve-entrypoint.py从serving.py调用serving.main()。它最终从(服务)调用torchserve.py。
在torchserve.py的第34行将"/etc/default-ts.properties“定义为DEFAULT_TS_CONFIG_FILE。此文件位于这里。在此文件中设置了config=true。它将使用此文件将IFF环境变量"ENABLE_MULTI_MODEL“设置为"false”作为引用的这里。如果它被设置为"true“，那么它将使用/etc/mme-ts.properties

关于Are there any settings in AWS Sagemaker deployment to use all vCPUs to increase concurrency?问题，您可以为模型使用各种设置，您可以在环境变量中的config.properties TS_DEFAULT_WORKERS_PER_MODEL=$(nproc --all)中设置default_workers_per_model。环境变量优先。

除了每个模型之外，您还可以使用management来设置工作人员的数量，但遗憾的是，无法在sagemaker中设置管理API。所以TS_DEFAULT_WORKERS_PER_MODEL是最好的选择。设置这个应该确保所有的核心都被使用。

但是，如果您使用的是docker文件，那么可以在entrypoint中设置脚本，等待模型加载并将其卷曲到其中以设置工作人员的数量。

# load the model
curl -X POST localhost:8081/models?url=model_1.mar&batch_size=8&max_batch_delay=50
# after loading the model it is possible to set min_worker, etc
curl -v -X PUT http://localhost:8081/models/model_1?min_worker=1

关于日志确认并非所有核心都被使用的另一个问题，我面临同样的问题，并且相信这是日志系统中的一个问题。请看这个问题，https://github.com/pytorch/serve/issues/782。社区本身同意，如果没有设置线程，那么默认情况下它会打印0，即使默认情况下它使用2*num_cores。

以获得所有可能的的详尽集合

# Reference: https://github.com/pytorch/serve/blob/master/docs/configuration.md
# Variables that can be configured through config.properties and Environment Variables
# NOTE: Variables which can be configured through environment variables **SHOULD** have a
# "TS_" prefix
# debug
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
model_store=/opt/ml/model
load_models=model_1.mar
# blacklist_env_vars
# default_workers_per_model
# default_response_timeout
# unregister_model_timeout
# number_of_netty_threads
# netty_client_threads
# job_queue_size
# number_of_gpu
# async_logging
# cors_allowed_origin
# cors_allowed_methods
# cors_allowed_headers
# decode_input_request
# keystore
# keystore_pass
# keystore_type
# certificate_file
# private_key_file
# max_request_size
# max_response_size
# default_service_handler
# service_envelope
# model_server_home
# snapshot_store
# prefer_direct_buffer
# allowed_urls
# install_py_dep_per_model
# metrics_format
# enable_metrics_api
# initial_worker_port

# Configuration which are not documented or enabled through environment variables

# When below variable is set true, then the variables set in environment have higher precedence.
# For example, the value of an environment variable overrides both command line arguments and a property in the configuration file. The value of a command line argument overrides a value in the configuration file.
# When set to false, environment variables are not used at all
# use_native_io=
# io_ratio=
# metric_time_interval=
enable_envvars_config=true
# model_snapshot=
# version=

票数 6

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68083831

复制

相似问题

问未使用所有vCPUs的AWS Sagemaker推理端点
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问未使用所有vCPUs的AWS Sagemaker推理端点EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问未使用所有vCPUs的AWS Sagemaker推理端点
EN