我正在尝试理解关于ML模型的guide,以预测某人在泰坦尼克号沉没后幸存的可能性。
我被困在21号牢房了。它基本上是试图比较21种不同的ML算法在拆分数据后的性能。因此,最终结果将如下所示:
Expected result cell 21, if run correctly
单元格21:
# Machine Learning Algorithm (MLA) Selection and Initialization
MLA = [
# Ensemble Methods
ensemble.AdaBoostClassifier(),
ensemble.BaggingClassifier(),
ensemble.ExtraTreesClassifier(),
ensemble.GradientBoostingClassifier(),
ensemble.RandomForestClassifier(),
# Gaussian Processes
gaussian_process.GaussianProcessClassifier(),
# GLM
linear_model.LogisticRegressionCV(),
linear_model.PassiveAggressiveClassifier(),
linear_model.RidgeClassifierCV(),
linear_model.SGDClassifier(),
linear_model.Perceptron(),
# Navies Bayes
naive_bayes.BernoulliNB(),
naive_bayes.GaussianNB(),
# Nearest Neighbor
neighbors.KNeighborsClassifier(),
# SVM
svm.SVC(probability = True),
svm.NuSVC(probability = True),
svm.LinearSVC(),
# Trees
tree.DecisionTreeClassifier(),
tree.ExtraTreeClassifier(),
# Discriminant Analysis
discriminant_analysis.LinearDiscriminantAnalysis(),
discriminant_analysis.QuadraticDiscriminantAnalysis(),
# xgboost
XGBClassifier()
]
# Split dataset in cross-validation with this splitter class
# note: this is an alternative to train_test_split
cv_split = model_selection.ShuffleSplit(n_splits = 10, test_size = .3, train_size = .6, random_state = 0)
# run model 10x with split 60/30 split intentionally leaving 10%
# Create table to compare MLA metrics
MLA_columns = ['MLA Name', 'MLA Parameters', 'MLA Train Accuracy Mean', 'MLA Test Accuracy Mean',
'MLA Test Accuracy 3*STD', 'MLA Time']
MLA_compare = pd.DataFrame(columns = MLA_columns)
# Create table to compare MLA predictions
MLA_predict = data1[Target]
# Index through MLA and save performance to table
row_index = 0
for alg in MLA:
# set name and parameters
MLA_name = alg.__class__.__name__
MLA_compare.loc[row_index, 'MLA Name'] = MLA_name
MLA_compare.loc[row_index, 'MLA Parameters'] = str(alg.get_params())
# score model with cross validation
cv_results = model_selection.cross_validate(alg, data1[data1_x_bin], data1[Target], cv = cv_split)
MLA_compare.loc[row_index, 'MLA Time'] = cv_results['fit_time'].mean()
print(cv_results.keys())
MLA_compare.loc[row_index, 'MLA Train Accuracy Mean'] = cv_results['train_score'].mean()
MLA_compare.loc[row_index, 'MLA Test Accuracy Mean'] = cv_results['test_score'].mean()
# If this is a non-bias random sample, then +/-3 standard deviations (std) from the mean, should statistically
# capture 99.7% of the subsets.
MLA_compare.loc[row_index, 'MLA Test Accuracy 3*STD'] = cv_results['test_score'].std()*3
# Let's know the worst that can happen!
# Save MLA predictions
alg.fit(data1[data1_x_bin], data1[Target])
MLA_predict[MLA_name] = alg.predict(data1[data1_x_bin])
row_index+=1
# Print and sort table
MLA_compare.sort_values(by = ['MLA Test Accuracy Mean'], ascending = False, inplace = True)
MLA_compare
# MLA_predict在运行它之后,我得到以下错误:
dict_keys(['fit_time', 'score_time', 'test_score'])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-21-cbe9dc24e1e0> in <module>
67 MLA_compare.loc[row_index, 'MLA Time'] = cv_results['fit_time'].mean()
68 print(cv_results.keys())
---> 69 MLA_compare.loc[row_index, 'MLA Train Accuracy Mean'] = cv_results['train_score'].mean()
70 MLA_compare.loc[row_index, 'MLA Test Accuracy Mean'] = cv_results['test_score'].mean()
71
KeyError: 'train_score'如你所见,'train_score‘甚至不是作为cv_results.keys()而存在的。
发布于 2020-07-05 07:24:13
根据要返回的train_score列的sklearn.model_selection.cross_validate文档,需要将return_train_score指定为true,如下所示:
cv_results = model_selection.cross_validate(alg, data1[data1_x_bin], data1[Target], cv = cv_split, return_train_score=True)https://stackoverflow.com/questions/62702952
复制相似问题