我想使用sklearn2pmml将我的XGBoost模型保存为pmml。我使用的是PythonV3.7.3和Sklearn 0.20.3 & sklearn2pmml V0.53.0。我的数据主要是二进制的,只有3列连续数据,我在Databricks中运行我的笔记本,并将我的Spark数据帧转换为pandas数据帧。下面的代码片段
import xgboost as xgb
from sklearn_pandas import DataFrameMapper
from sklearn.compose import ColumnTransformer
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml.decoration import ContinuousDomain
from sklearn.preprocessing import StandardScaler
X = pdf[continuous_features + numericCols]
y = pdf["Label"]
mapper = DataFrameMapper(
[([cont_column], [ContinuousDomain(), StandardScaler()]) for cont_column in continuous_features] +
[([c for c in numericCols], None)] # no transformation
)
clf = xgb.XGBClassifier(objective='multi:softprob',eval_metric='auc',num_class = 2,
n_jobs =6,max_delta_step=1, min_child_weight=14, gamma=1.5, subsample = 0.8,
colsample_bytree = 0.5, max_depth=10, learning_rate = 0.1)
pipeline = PMMLPipeline([
("mapper", mapper),
("estimator", clf)
])
pipeline.fit(X,y.values.reshape(-1,))
sklearn2pmml(pipeline, "xgb_V1.pmml", with_repr = True)
管道根据数据进行拟合,使用pipeline.score(X,y)和pipeline.predict(X)生成分数和预测,但当我尝试将其写入pmml时,我得到以下错误:
Standard output is empty
Standard error:
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 47 ms.
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
INFO: Converting..
Feb 21, 2020 1:53:30 PM sklearn2pmml.pipeline.PMMLPipeline initTargetFields
WARNING: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.target_fields' is not set. Assuming y as the name of the target field
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class xgboost.compat.XGBoostLabelEncoder)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40)
at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34)
at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:133)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
... 7 more
Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class xgboost.compat.XGBoostLabelEncoder)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40)
at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34)
at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:133)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
根据这篇文章https://github.com/jpmml/sklearn2pmml/issues/197,我认为这可能是Sklearn和sklearn2pmml之间的版本不兼容问题,但我认为我安装的版本应该是可以的。你知道这是怎么回事吗?提前感谢
发布于 2020-02-21 23:32:08
这可能是XGBoost包的版本问题。SkLearn2PMML包期望标签编码器(XGBClassifier._le属性)是一个“普通的”Scikit-Learn标签编码器类(sklearn.preprocessing.(label|_label).LabelEncoder),但在您的例子中是不同的(xgboost.compat.XGBoostLabelEncoder)。
此xgboost.compat.XGBoostLabelEncoder是在哪个XGBOost包版本中引入的?要么是很老的东西,要么是很新的东西。
在任何情况下,请使用JPMML SkLearn项目here打开一个功能请求来解决这个问题。
https://stackoverflow.com/questions/60340231
复制相似问题