我在sagemaker推理端点(单个实例)上部署了一个自定义模型,当我进行负载测试时,我观察到CPU利用率正在达到100%的最大值,但是根据这个职位,它应该在#vCPU*100 %时达到最大值。我已经确认,推断端点并不是使用小丑监视日志中的所有核心。
因此,如果一个预测调用需要处理1秒才能给出响应,那么部署的模型只能每秒处理一个API调用,如果使用了所有的vCPU,这个调用可能会增加到每秒8个。
在AWS部署中是否有任何设置可以使用所有的vCPU来增加并发性?
或者,在部署时,我们可以在inference.py文件中使用多处理python包,这样每个调用都进入默认的核心,并且所有的计算/预测都是在任何其他内核中完成的,哪个是该实例中的空?
发布于 2021-06-23 02:40:05
更新
1. ENABLE\_MULTI\_MODEL as "true" (make sure it is string and not bool) and set [SAGEMAKER\_HANDLER](https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/torchserve.py#L74) as custom model handler python module path if custom service else dont define it. Also make sure model name [model.mar](https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/torchserve.py#L94), before compressing it as tar ball and storing in s3
2. TS\_DEFAULT\_WORKERS\_PER\_MODEL as number of vcpus
3. First environment variable makes sure torch serve env\_vars are enabled and second one uses first setting and loads requested number of workers
4. Setting can be done by passing env dictionary argument to [PyTorch function](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#create-an-estimator). Below is explanation as to why it works关于Are there any settings in AWS Sagemaker deployment to use all vCPUs to increase concurrency?问题,您可以为模型使用各种设置,您可以在环境变量中的config.properties TS_DEFAULT_WORKERS_PER_MODEL=$(nproc --all)中设置default_workers_per_model。环境变量优先。
除了每个模型之外,您还可以使用management来设置工作人员的数量,但遗憾的是,无法在sagemaker中设置管理API。所以TS_DEFAULT_WORKERS_PER_MODEL是最好的选择。设置这个应该确保所有的核心都被使用。
但是,如果您使用的是docker文件,那么可以在entrypoint中设置脚本,等待模型加载并将其卷曲到其中以设置工作人员的数量。
# load the model
curl -X POST localhost:8081/models?url=model_1.mar&batch_size=8&max_batch_delay=50
# after loading the model it is possible to set min_worker, etc
curl -v -X PUT http://localhost:8081/models/model_1?min_worker=1关于日志确认并非所有核心都被使用的另一个问题,我面临同样的问题,并且相信这是日志系统中的一个问题。请看这个问题,https://github.com/pytorch/serve/issues/782。社区本身同意,如果没有设置线程,那么默认情况下它会打印0,即使默认情况下它使用2*num_cores。
以获得所有可能的的详尽集合
# Reference: https://github.com/pytorch/serve/blob/master/docs/configuration.md
# Variables that can be configured through config.properties and Environment Variables
# NOTE: Variables which can be configured through environment variables **SHOULD** have a
# "TS_" prefix
# debug
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
model_store=/opt/ml/model
load_models=model_1.mar
# blacklist_env_vars
# default_workers_per_model
# default_response_timeout
# unregister_model_timeout
# number_of_netty_threads
# netty_client_threads
# job_queue_size
# number_of_gpu
# async_logging
# cors_allowed_origin
# cors_allowed_methods
# cors_allowed_headers
# decode_input_request
# keystore
# keystore_pass
# keystore_type
# certificate_file
# private_key_file
# max_request_size
# max_response_size
# default_service_handler
# service_envelope
# model_server_home
# snapshot_store
# prefer_direct_buffer
# allowed_urls
# install_py_dep_per_model
# metrics_format
# enable_metrics_api
# initial_worker_port
# Configuration which are not documented or enabled through environment variables
# When below variable is set true, then the variables set in environment have higher precedence.
# For example, the value of an environment variable overrides both command line arguments and a property in the configuration file. The value of a command line argument overrides a value in the configuration file.
# When set to false, environment variables are not used at all
# use_native_io=
# io_ratio=
# metric_time_interval=
enable_envvars_config=true
# model_snapshot=
# version=https://stackoverflow.com/questions/68083831
复制相似问题