文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在使用feature_importance时获得sklearn2pmml

问如何在使用feature_importance时获得sklearn2pmml
EN

Stack Overflow用户

提问于 2017-05-19 01:56:02

回答 2查看 880关注 0票数 0

现在，我在中训练了一个名为“GB”的gbdt模型。我想把这个经过训练的模型导出到pmml文件中。但是我遇到了这样的问题: 1.如果我尝试将经过训练的'GB‘模型放入PMMLpipeline并使用sklearn2pmml导出模型。如下所示：

GB = GradientBoostingClassifier(n_estimators=100,learning_rate=0.05)
GB.fit(train[list(x_features),Train['Target']])
GB_pipeline = PMMLPipeline([("classifier",GB)])
sklearn2pmml.sklearn2pmml(GB_pipeline,pmml='GB.pmml')
importance=gb.feature_importances_

有一个警告‘'active_fields’属性没有设置‘。我将在导出的pmml文件中丢失所有功能的名称。

如果我试图在PMMLPipeline中直接训练这个模型。因为在feature_importances_中没有GB_pipeline属性，所以我不能观察这个模型的features_importance。如下所示： GB_pipeline =PMMLPipeline((“分类器”，GradientBoostingClassifier(n_estimators=100，learning_rate=0.05)PMMLPipeline.fit(火车[列表(X_features)，列车‘目标’) sklearn2pmml.sklearn2pmml(GB_pipeline，pmml='GB.pmml')

我应该做什么，既可以观察模型的features_importance，也可以将特性的名称保存在导出的pmml文件中。非常感谢!

python

pmml

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-05-19 07:38:14

要点：

在管道外实例化分类器
实例化(PMML-)管道，将此分类器插入其中。
把这条管道装成一个整体。
打印此分类器的特性重要性，并将此管道导出到PMML文档中。

在您的第一个代码示例中，您正在对分类器进行拟合，但应该将管道作为一个整体进行拟合--因此警告说管道的内部状态是不完整的。在第二个代码示例中，您没有对分类器的直接引用(但是，您可以通过“解析”拟合管道的最后一步来获得它)。

基于Iris数据集的完整示例：

import pandas
iris_df = pandas.read_csv("Iris.csv")

from sklearn.ensemble import GradientBoostingClassifier
from sklearn2pmml import sklearn2pmml, PMMLPipeline
gbt = GradientBoostingClassifier()
pipeline = PMMLPipeline([
    ("classifier", gbt)
])
pipeline.fit(iris_df[iris_df.columns.difference(["Species"])], iris_df["Species"])
print (gbt.feature_importances_)
sklearn2pmml(pipeline, "GBTIris.pmml", with_repr = True)

票数 1

Stack Overflow用户

发布于 2022-06-17 13:56:30

如果您像我一样来到这里，在从Python的管道中包含重要的，那么我有一个好消息。

我尝试在互联网上搜索它，并了解到:我们必须在python中的RF模型中手动创建重要字段，这样它才能将它们存储在PMML中。

TL;DR这里是代码

# Keep the model object outside which is the trick
RFModel = RandomForestRegressor()

# Make the pipeline as usual
column_trans = ColumnTransformer([
    ('onehot', OneHotEncoder(drop='first'), ["sex", "smoker", "region"]),
    ('Stdscaler', StandardScaler(), ["age", "bmi"]),
    ('MinMxscaler', MinMaxScaler(), ["children"])
])


pipeline = PMMLPipeline([
    ('col_transformer', column_trans),
    ('model', RFModel)
])

# Fit the pipeline
pipeline.fit(X, y)

# Store the importances in the temproary variable
importances = RFModel.feature_importances_

# Assign them in the MODEL ITSELF (The main part)
RFModel.pmml_feature_importances_ = importances

# Finally save the model as usual
sklearn2pmml(pipeline, r"path\file.pmml")

现在，您将看到PMML文件中的重要性！！

参考来源：开放取心

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/44060248

复制

相似问题

问如何在使用feature_importance时获得sklearn2pmml
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在使用feature_importance时获得sklearn2pmmlEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在使用feature_importance时获得sklearn2pmml
EN