首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何将流水线和超参数结合在AzureML SDK中的训练步骤

如何将流水线和超参数结合在AzureML SDK中的训练步骤
EN

Stack Overflow用户
提问于 2021-05-02 04:21:17
回答 1查看 295关注 0票数 1

简短形式:我正在试图弄清楚如何在培训步骤(即train_step =PythonScriptStep(.))内运行超调在准备过程中,我不知道我该把"config=hyperdrive“放在哪里

长形式:

常规:

代码语言:javascript
复制
# Register the environment 
diabetes_env.register(workspace=ws)
registered_env = Environment.get(ws, 'diabetes-pipeline-env')

# Create a new runconfig object for the pipeline
run_config = RunConfiguration()

# Use the compute you created above. 
run_config.target = ComputerTarget_Crea

# Assign the environment to the run configuration
run_config.environment = registered_env

Hyperparam:

代码语言:javascript
复制
script_config = ScriptRunConfig(source_directory=experiment_folder,
                                script='diabetes_training.py',
                                # Add non-hyperparameter arguments -in this case, the training dataset
                                arguments = ['--input-data', diabetes_ds.as_named_input('training_data')],
                                environment=sklearn_env,
                                compute_target = training_cluster)

# Sample a range of parameter values
params = GridParameterSampling(
    {
        # Hyperdrive will try 6 combinations, adding these as script arguments
        '--learning_rate': choice(0.01, 0.1, 1.0),
        '--n_estimators' : choice(10, 100)
    }
)

# Configure hyperdrive settings
hyperdrive = HyperDriveConfig(run_config=script_config, 
                          hyperparameter_sampling=params, 
                          policy=None, # No early stopping policy
                          primary_metric_name='AUC', # Find the highest AUC metric
                          primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, 
                          max_total_runs=6, # Restict the experiment to 6 iterations
                          max_concurrent_runs=2) # Run up to 2 iterations in parallel

# Run the experiment if I only want to run hyperparam alone without the pipeline
#experiment = Experiment(workspace=ws, name='mslearn-diabetes-hyperdrive')
#run = experiment.submit(**config=hyperdrive**)

PipeLine:

代码语言:javascript
复制
prep_step = PythonScriptStep(name = "Prepare Data",
                                source_directory = experiment_folder,
                                script_name = "prep_diabetes.py",
                                arguments = ['--input-data', diabetes_ds.as_named_input('raw_data'),
                                             '--prepped-data', prepped_data_folder],
                                outputs=[prepped_data_folder],
                                compute_target = ComputerTarget_Crea,
                                runconfig = run_config,
                                allow_reuse = True)

# Step 2, run the training script
train_step = PythonScriptStep(name = "Train and Register Model",
                                source_directory = experiment_folder,
                                script_name = "train_diabetes.py",
                                arguments = ['--training-folder', prepped_data_folder],
                                inputs=[prepped_data_folder],
                                compute_target = ComputerTarget_Crea,
                                runconfig = run_config,
                                allow_reuse = True)
# Construct the pipeline
pipeline_steps = [prep_step, train_step]
pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
print("Pipeline is built.")

# Create an experiment and run the pipeline
**#How do I need to change these below lines to use hyperdrive????**
experiment = Experiment(workspace=ws, name = 'mslearn-diabetes-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)

我不知道我需要把config=hyperdrive放在管道部分哪里?

EN

回答 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67352949

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档