我一直试图深入了解重采样方法的更多细节,并在1000行的小数据集中实现它们。数据被分成800个训练集和200个验证集。我使用K-折叠交叉验证和重复K-交叉验证来使用训练集训练KNN。根据我的理解,我对结果作了一些解释--但是,我对它们有一定的怀疑(见下面的问题):
结果:10倍Cv
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 720, 720, 720, 720, 720, 720, ...
Resampling results across tuning parameters:
k Accuracy Kappa
5 0.6600 0.07010791
7 0.6775 0.09432414
9 0.6800 0.07054371
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 9.重复10倍,10重复
Resampling results across tuning parameters:
k Accuracy Kappa
5 0.670250 0.10436607
7 0.676875 0.09288219
9 0.683125 0.08062622
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 9.10倍,1000重复
k Accuracy Kappa
5 0.6680438 0.09473128
7 0.6753375 0.08810406
9 0.6831800 0.07907891
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 9. 10倍,2000重复
k Accuracy Kappa
5 0.6677981 0.09467347
7 0.6750369 0.08713170
9 0.6826894 0.07772184怀疑
K=9是精度最高的最优值。但是,我不明白如何在最终选择参数值时考虑到Kappa?发布于 2016-07-17 11:56:07
Accuracy和Kappa只是不同的分类性能指标。简而言之,它们的不同之处在于,Accuracy 没有考虑到可能的阶级不平衡在计算度量时,有吗?。因此,对于不平衡的类,最好使用Kappa。有了R caret,您就可以通过train::metric参数来做到这一点。caret::train会自动给你,所以我建议你使用它。https://stackoverflow.com/questions/38355373
复制相似问题