python包fancyimpute提供了几种数据填充方法。我尝试使用软归因法;但是,软归因法不提供用于测试数据集的转换方法。更准确地说,Sklearn SimpleImputer (例如下面的示例)提供了fit、transform和fit_transform方法。另一方面,SoftImpute提供了唯一的fit_transform,它允许我对训练数据进行拟合,但不会将其转换为测试集。我理解在训练集和测试集上拟合推定会导致数据从测试集泄漏到训练集。为此,我们需要适应训练,并在测试中进行转换。有没有办法将我从训练集中拟合的测试集以软推算的方式进行推算?我很感谢你的想法。
# this example from https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html
import numpy as np
from sklearn.impute import SimpleImputer
imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])
X_train = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]]
print(imp_mean.transform(X_train))
# SimpleImputer provides transform method, so we can apply fitted imputation into the
testing data e.g.
# X_test =[...]
# print(imp_mean.transform(X_test))
from fancyimpute import SoftImpute
clf = SoftImpute(verbose=True)
clf.fit_transform(X_train)
## There is no clf.tranform to be used with test set e.g. clf.transform(X_test)发布于 2021-01-16 17:29:13
花式推算不支持归纳模式。这里重要的是在不使用测试数据的情况下填充训练数据。我认为您可以使用估算的训练数据来估算测试数据。示例代码:
len_train_data=train_df.shape[0]<br>
imputer=SoftImpute() <br>
#impute train data <br>
X_train_fill_SVD = imputer.fit_transform(train_df)<br>
X_train_fill_SVD=pd.DataFrame(X_train_fill_SVD)<br>
#concat imputed train and test<br>
Concat_data=pd.concat((X_train_fill_SVD,test_df),axis=0)<br>
Concat_data=imputer.fit_transform(Concat_data)<br>
Concat_data=pd.DataFrame(Concat_data)<br>
#fetch imputed test data <br>
X_test_fill_SVD=Concat_data.iloc[len_train_data:,]<br>https://stackoverflow.com/questions/61193194
复制相似问题