我有一个管道运行预处理,然后是一个随机生存森林从SciKit生存包。我正在尝试使用Scikit-Survival的as_concordance_index_ipcw_scorer()类found 这里。
我的管道如下所示:
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('num',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler',
StandardScaler())]),
Index(['IntVar1', 'IntVar2', 'IntVar3',
'IntVar4'],
dtype='object')),
('cat',
Pipeline(steps=[('imputer',
SimpleImputer(fill_value='missing',
strategy='constant')),
('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse=False))]),
Index(['CharVar1', 'CharVar2', 'CharVar3'], dtype='object'))])),
('randomsurvivalforest',
RandomSurvivalForest(max_features='sqrt',
min_samples_leaf=0.005,
min_samples_split=0.01, n_estimators=150,
n_jobs=-1, oob_score=True,
random_state=200))])这是通向管道的python代码和管道的拟合:
print("Importing global DF")
print("Creating X & Y set")
X = df.iloc[:,:-2].copy()
y = Surv.from_dataframe("AliveStatus","Target_Age",df.iloc[:,-2:].copy()) ## Creates structured array for Scikit Surv
print("Defining feature categories by data type")
numerical_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns
print("Splitting dataset")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5) #SkLearn splitter
print("Defining preprocessing steps using SciKitLearn pipeline...")
## Pipeline Steps
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(sparse=False,handle_unknown='ignore'))]) ## Use "sparse=False" because Random Forests cannot take Spare Matrixes, only Dense Matrixes.
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)])
## Pipeline defining
print("Defining Random Survival Forest pipeline from SciKit Survival")
rsf = make_pipeline(
preprocessor,
RandomSurvivalForest(n_estimators=150, ## Default 100
min_samples_split=0.01, ## Default 6
min_samples_leaf=0.005, ## Default 3
max_features="sqrt", ## Defaults to none when not defined
n_jobs=-1, ## Default -1
oob_score = True,
random_state=200) ## Random State 2020
)
##Fitting & Scoring
print("Fitting dataframe to RSF Pipeline")
rsf.fit(X_train,y_train)
print("Fitting completed.")在试穿完成后,我试着运行以下步骤:
as_concordance_index_ipcw_scorer(rsf).score(X_test,y_test)我得到以下错误后:
AttributeError Traceback (most recent call last)
<ipython-input-97-9a92b22d8026> in <module>
----> 1 as_concordance_index_ipcw_scorer(rsf).score(X_test,y_test)
C:\ProgramData\Anaconda3\lib\site-packages\sksurv\metrics.py in score(self, X, y)
788 score : float
789 """
--> 790 estimate = self._do_predict(X)
791 score = self._score_func(
792 survival_train=self._train_y,
C:\ProgramData\Anaconda3\lib\site-packages\sksurv\metrics.py in _do_predict(self, X)
768
769 def _do_predict(self, X):
--> 770 predict_func = getattr(self.estimator_, self._predict_func)
771 return predict_func(X)
772
AttributeError: 'as_concordance_index_ipcw_scorer' object has no attribute 'estimator_'我尝试过的一个选项是指定管道的RSF部分,但没有成功:
as_concordance_index_ipcw_scorer(rsf[1]).score(X_test,y_test)有什么建议吗?
对于长度或缺少的信息,我很抱歉,我对管道和ScikitSurvival并不熟悉,我想给出尽可能多的细节。
谢谢
发布于 2022-01-09 23:08:37
需要对来自as_concordance_index_ipcw_scorer的估计器实例进行拟合;在这种情况下,已经安装了基本的估计器也没有帮助。
从源代码 ( Mixin类)中,安装这些包装器中的一个适合底层估计器,将其保存在新的属性estimator_中(这是您的错误抱怨丢失的地方),还保存了培训标签。因此,您可能能够直接创建这些属性,而不会产生不良影响,但您将在预期的过程中进行操作。
https://stackoverflow.com/questions/70628206
复制相似问题