首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >安装管道-处理步骤- SKLearn -缺失的SKLearn扩展

安装管道-处理步骤- SKLearn -缺失的SKLearn扩展
EN

Stack Overflow用户
提问于 2021-08-26 06:53:49
回答 1查看 227关注 0票数 0

我希望将SageMaker管道自动化,以便它能够跨环境构建、培训和部署模型。我不是一个数据科学家,这个领域对我来说是非常新的,所以斗争是真实的!

我已经设置了一个正确构建代码的管道,但是当需要对步骤进行预处理时,错误失败了,没有一个模块名为‘skearchExtensions’

下面的Preprocess.py脚本

代码语言:javascript
复制
from numpy import nan
from sagemaker_sklearn_extension.externals import Header
from sagemaker_sklearn_extension.impute import RobustImputer
from sagemaker_sklearn_extension.preprocessing import NALabelEncoder
from sagemaker_sklearn_extension.preprocessing import RobustStandardScaler
from sagemaker_sklearn_extension.preprocessing import ThresholdOneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Given a list of column names and target column name, Header can return the index
# for given column name
HEADER = Header(
   column_names=[
       '1', '2', '3', '4', '5',
       '6', '7'
   ],
   target_column_name='6'
)


def build_feature_transform():
   """ Returns the model definition representing feature processing."""

   # These features can be parsed as numeric.

   numeric = HEADER.as_feature_indices(
       ['1', '2', '3', '4']
   )

   # These features contain a relatively small number of unique items.

   categorical = HEADER.as_feature_indices(
       ['1', '2', '3', '4']
   )

   numeric_processors = Pipeline(
       steps=[
           (
               'robustimputer',
               RobustImputer(strategy='constant', fill_values=nan)
           )
       ]
   )

   categorical_processors = Pipeline(
       steps=[('thresholdonehotencoder', ThresholdOneHotEncoder(threshold=8))]
   )

   column_transformer = ColumnTransformer(
       transformers=[
           ('numeric_processing', numeric_processors, numeric
           ), ('categorical_processing', categorical_processors, categorical)
       ]
   )

   return Pipeline(
       steps=[
           ('column_transformer', column_transformer
           ), ('robuststandardscaler', RobustStandardScaler())
       ]
   )


def build_label_transform():
   """Returns the model definition representing feature processing."""

   return NALabelEncoder()

下面是调用流程pipeline.py的脚本

代码语言:javascript
复制
 # processing step for feature engineering
   sklearn_processor = SKLearnProcessor(
       framework_version="0.23-1",
       instance_type=processing_instance_type,
       instance_count=processing_instance_count,
       base_job_name=f"{base_job_prefix}/sklearn-job-preprocess",
       sagemaker_session=sagemaker_session,
       role=role,
   )
   step_process = ProcessingStep(
       name="PreprocessJobData",
       processor=sklearn_processor,
       outputs=[
           ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
           ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
           ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
       ],
       code=os.path.join(BASE_DIR, "preprocess.py"),
       job_arguments=["--input-data", input_data],
   )

任何帮助都将不胜感激!

EN

回答 1

Stack Overflow用户

发布于 2021-08-26 16:18:46

首先,您可以按照描述的这里从pip安装这些扩展。

但是,要使用外部模块中的I/O功能,还需要安装只能通过mlio获得的conda

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68933828

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档