首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >是否可以在H2O无人驾驶系统中定义最终模型使用多少个变量

是否可以在H2O无人驾驶系统中定义最终模型使用多少个变量
EN

Stack Overflow用户
提问于 2021-02-02 13:46:08
回答 1查看 46关注 0票数 0

目前我正在探索H2O DAI的功能。请理解,在功能选择/工程阶段,H2O能够选择要使用的变量以及要应用于这些变量的转换器。但是,有没有一种方法可以在H2O DAI中配置,以限制提供的列表中它可以使用的最大功能数量?例如,给定了100个特征,我只希望H2O DAI从中选择20个特征并对其应用特征工程。已尝试浏览用户手册,但到目前为止没有找到任何有关这方面的提示。

在此之前,非常感谢您。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-02-02 15:38:41

有几个选项可用于控制所使用的要素数量

代码语言:javascript
复制
# Maximum number of columns selected out of original set of original columns, using feature selection
# The selection is based upon how well target encoding (or frequency encoding if not available) on categoricals and numerics treated as categoricals
# This is useful to reduce the final model complexity. First the best
# [max_orig_cols_selected] are found through feature selection methods and then
# these features are used in feature evolution (to derive other features) and in modelling.
#max_orig_cols_selected = 10000

# Maximum number of numeric columns selected, above which will do feature selection
# same as above (max_orig_cols_selected) but for numeric columns.
#max_orig_numeric_cols_selected = 10000

# Maximum number of non-numeric columns selected, above which will do feature selection on all features and avoid treating numerical as categorical
# same as above (max_orig_numeric_cols_selected) but for categorical columns.
#max_orig_nonnumeric_cols_selected = 300

# Like max_orig_cols_selected, but columns above which add special individual with original columns reduced.
# 
#fs_orig_cols_selected = 500
代码语言:javascript
复制
# Maximum features per model (and each model within the final model if ensemble) kept.
# Keeps top variable importance features, prunes rest away, after each scoring.
# Final ensemble will exclude any pruned-away features and only train on kept features,
# but may contain a few new features due to fitting on different data view (e.g. new clusters)
# Final scoring pipeline will exclude any pruned-away features,
# but may contain a few new features due to fitting on different data view (e.g. new clusters)
# -1 means no restrictions except internally-determined memory and interpretability restrictions.
# Notes:
# * If interpretability > remove_scored_0gain_genes_in_postprocessing_above_interpretability, then
# every GA iteration post-processes features down to this value just after scoring them.  Otherwise,
# only mutations of scored individuals will be pruned (until the final model where limits are strictly applied).
# * If ngenes_max is not also limited, then some individuals will have more genes and features until
# pruned by mutation or by preparation for final model.
# * E.g. to generally limit every iteration to exactly 1 features, one must set nfeatures_max=ngenes_max=1
# and remove_scored_0gain_genes_in_postprocessing_above_interpretability=0, but the genetic algorithm
# will have a harder time finding good features.
# 
#nfeatures_max = -1

查看config.toml file或查看专家设置。

请注意,您不能控制是否有转换器的特定功能。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66004255

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档