我希望将SageMaker管道自动化,以便它能够跨环境构建、培训和部署模型。我不是一个数据科学家,这个领域对我来说是非常新的,所以斗争是真实的!
我已经设置了一个正确构建代码的管道,但是当需要对步骤进行预处理时,错误失败了,没有一个模块名为‘skearchExtensions’
下面的Preprocess.py脚本
from numpy import nan
from sagemaker_sklearn_extension.externals import Header
from sagemaker_sklearn_extension.impute import RobustImputer
from sagemaker_sklearn_extension.preprocessing import NALabelEncoder
from sagemaker_sklearn_extension.preprocessing import RobustStandardScaler
from sagemaker_sklearn_extension.preprocessing import ThresholdOneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
# Given a list of column names and target column name, Header can return the index
# for given column name
HEADER = Header(
column_names=[
'1', '2', '3', '4', '5',
'6', '7'
],
target_column_name='6'
)
def build_feature_transform():
""" Returns the model definition representing feature processing."""
# These features can be parsed as numeric.
numeric = HEADER.as_feature_indices(
['1', '2', '3', '4']
)
# These features contain a relatively small number of unique items.
categorical = HEADER.as_feature_indices(
['1', '2', '3', '4']
)
numeric_processors = Pipeline(
steps=[
(
'robustimputer',
RobustImputer(strategy='constant', fill_values=nan)
)
]
)
categorical_processors = Pipeline(
steps=[('thresholdonehotencoder', ThresholdOneHotEncoder(threshold=8))]
)
column_transformer = ColumnTransformer(
transformers=[
('numeric_processing', numeric_processors, numeric
), ('categorical_processing', categorical_processors, categorical)
]
)
return Pipeline(
steps=[
('column_transformer', column_transformer
), ('robuststandardscaler', RobustStandardScaler())
]
)
def build_label_transform():
"""Returns the model definition representing feature processing."""
return NALabelEncoder()下面是调用流程pipeline.py的脚本
# processing step for feature engineering
sklearn_processor = SKLearnProcessor(
framework_version="0.23-1",
instance_type=processing_instance_type,
instance_count=processing_instance_count,
base_job_name=f"{base_job_prefix}/sklearn-job-preprocess",
sagemaker_session=sagemaker_session,
role=role,
)
step_process = ProcessingStep(
name="PreprocessJobData",
processor=sklearn_processor,
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
],
code=os.path.join(BASE_DIR, "preprocess.py"),
job_arguments=["--input-data", input_data],
)任何帮助都将不胜感激!
发布于 2021-08-26 16:18:46
首先,您可以按照描述的这里从pip安装这些扩展。
但是,要使用外部模块中的I/O功能,还需要安装只能通过mlio获得的conda。
https://stackoverflow.com/questions/68933828
复制相似问题