我定制了一个类,用空值替换字符串,以便在DataFrameMapper中使用。当我使用sklearn2pmml生成PMML文件时,我报告了一个编码错误。
这是我的示例转换器:
class DataEncode(TransformerMixin):
def fit(self,X,y=None):
return self
def transform(self,X):
X = X.replace("\\N",np.nan)
X = X.replace("-",np.nan)
X = X.astype(float)
return pd.concat([X],axis=1)DataFrameMapper:
from sklearn_pandas import DataFrameMapper
mapper = DataFrameMapper([
(['Sepal.Length'],[DataEncode(),ContinuousDomain(),Imputer(),StandardScaler()]),
(['Sepal.Width'],[DataEncode(),ContinuousDomain(),Imputer(),StandardScaler()]),
(['Petal.Length'],[DataEncode(),ContinuousDomain(),Imputer(),StandardScaler()]),
(['Petal.Width'],[DataEncode(),ContinuousDomain(),Imputer(),StandardScaler()]),
],input_df = True)培训模式:
from sklearn2pmml.pipeline import PMMLPipeline
gbdt_pipline = PMMLPipeline([
('mapper',mapper),
('classifier',clf)
])PMML文件:
sklearn2pmml(gbdt_pipline,"D:/mlfile/test/test_iris.pmml",with_repr=True,debug=True)错误:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-92-8e29dc6f358c> in <module>()
----> 1 sklearn2pmml(gbdt_pipline,"D:/mlfile/test/test_iris.pmml",with_repr=True,debug=True)
D:\anaconda-hh\lib\site-packages\sklearn2pmml\__init__.py in sklearn2pmml(pipeline, pmml, user_classpath, with_repr, debug)
231 print("Standard output is empty")
232 if(len(error) > 0):
--> 233 print("Standard error:\n{0}".format(error.decode("UTF-8")))
234 else:
235 print("Standard error is empty")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 4: invalid continuation byte我尝试了许多方法,但我做不到,我想知道是否不支持自定义类。我试图更改编码,并通过pkl将其转换为PMML,但是没有什么工作是有效的,谢谢您的帮助!
发布于 2018-11-26 09:50:42
SkLearn2PMML包提供支持自定义变压器和模型。然而,要使事情在不同的平台上工作,您需要同时实现Python端和Java端的转换逻辑。目前,Java端缺失了。
请参阅SkLearn2PMML-插件项目一步一步的说明.
如果目标是简单地检测和替换无效的数值,则不需要创建自定义转换器类,因为默认的sklearn2pmml.preprocessing.ContinuousDomain已经可以这样做了:
mapper = DataFrameMapper([
(['Sepal.Length'], [ContinuousDomain(invalid_value_replacement = float("NaN")), Imputer(), StandardScaler()])
])https://stackoverflow.com/questions/53476739
复制相似问题