首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用sklearn2pmml将XGBoost分类器写入pmml时出错

使用sklearn2pmml将XGBoost分类器写入pmml时出错
EN

Stack Overflow用户
提问于 2020-02-21 22:05:40
回答 1查看 1K关注 0票数 1

我想使用sklearn2pmml将我的XGBoost模型保存为pmml。我使用的是PythonV3.7.3和Sklearn 0.20.3 & sklearn2pmml V0.53.0。我的数据主要是二进制的,只有3列连续数据,我在Databricks中运行我的笔记本,并将我的Spark数据帧转换为pandas数据帧。下面的代码片段

代码语言:javascript
复制
import xgboost as xgb

from sklearn_pandas import DataFrameMapper
from sklearn.compose import ColumnTransformer

from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml.decoration import ContinuousDomain
from sklearn.preprocessing import StandardScaler

X = pdf[continuous_features + numericCols]
y = pdf["Label"]


mapper = DataFrameMapper(
  [([cont_column], [ContinuousDomain(), StandardScaler()]) for cont_column in continuous_features] +
  [([c for c in numericCols], None)] # no transformation
)

clf = xgb.XGBClassifier(objective='multi:softprob',eval_metric='auc',num_class = 2,
                        n_jobs =6,max_delta_step=1, min_child_weight=14, gamma=1.5, subsample = 0.8,
                        colsample_bytree = 0.5, max_depth=10, learning_rate = 0.1)


pipeline = PMMLPipeline([
  ("mapper", mapper),
  ("estimator", clf)
])

pipeline.fit(X,y.values.reshape(-1,))

sklearn2pmml(pipeline, "xgb_V1.pmml", with_repr = True)

管道根据数据进行拟合,使用pipeline.score(X,y)和pipeline.predict(X)生成分数和预测,但当我尝试将其写入pmml时,我得到以下错误:

代码语言:javascript
复制
Standard output is empty
Standard error:
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 47 ms.
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
INFO: Converting..
Feb 21, 2020 1:53:30 PM sklearn2pmml.pipeline.PMMLPipeline initTargetFields
WARNING: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.target_fields' is not set. Assuming y as the name of the target field
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class xgboost.compat.XGBoostLabelEncoder)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
	at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
	at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40)
	at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34)
	at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:133)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder
	at java.lang.Class.cast(Class.java:3369)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
	... 7 more

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class xgboost.compat.XGBoostLabelEncoder)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
	at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
	at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40)
	at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34)
	at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:133)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder
	at java.lang.Class.cast(Class.java:3369)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)

根据这篇文章https://github.com/jpmml/sklearn2pmml/issues/197,我认为这可能是Sklearn和sklearn2pmml之间的版本不兼容问题,但我认为我安装的版本应该是可以的。你知道这是怎么回事吗?提前感谢

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-02-21 23:32:08

这可能是XGBoost包的版本问题。SkLearn2PMML包期望标签编码器(XGBClassifier._le属性)是一个“普通的”Scikit-Learn标签编码器类(sklearn.preprocessing.(label|_label).LabelEncoder),但在您的例子中是不同的(xgboost.compat.XGBoostLabelEncoder)。

xgboost.compat.XGBoostLabelEncoder是在哪个XGBOost包版本中引入的?要么是很老的东西,要么是很新的东西。

在任何情况下,请使用JPMML SkLearn项目here打开一个功能请求来解决这个问题。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60340231

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档