我正在使用sklearn和SVC在虹膜数据上测试带有sigmoid内核的SVM。它的性能非常差,准确率只有25%。我使用的是与https://towardsdatascience.com/a-guide-to-svm-parameter-tuning-8bfe6b8a452c (sigmoid section)完全相同的代码,并对其功能进行了标准化,这将大大提高性能。然而,我无法重现他的结果,准确率仅提高到33%。
使用其他核(例如,线性核)可以产生良好的结果(准确率为82%)。在SVC(kernel = 'sigmoid')函数中会有问题吗?
重现问题的Python代码:
##sigmoid iris example
from sklearn import datasets
iris = datasets.load_iris()
from sklearn.svm import SVC
sepal_length = iris.data[:,0]
sepal_width = iris.data[:,1]
#assessing performance of sigmoid SVM
clf = SVC(kernel='sigmoid')
clf.fit(np.c_[sepal_length, sepal_width], iris.target)
pr=clf.predict(np.c_[sepal_length, sepal_width])
pd.DataFrame(classification_report(iris.target, pr, output_dict=True))
from sklearn.metrics.pairwise import sigmoid_kernel
sigmoid_kernel(np.c_[sepal_length, sepal_width])
#normalizing features
from sklearn.preprocessing import normalize
sepal_length_norm = normalize(sepal_length.reshape(1, -1))[0]
sepal_width_norm = normalize(sepal_width.reshape(1, -1))[0]
clf.fit(np.c_[sepal_length_norm, sepal_width_norm], iris.target)
sigmoid_kernel(np.c_[sepal_length_norm, sepal_width_norm])
#assessing perfomance of sigmoid SVM with normalized features
pr_norm=clf.predict(np.c_[sepal_length_norm, sepal_width_norm])
pd.DataFrame(classification_report(iris.target, pr_norm, output_dict=True))发布于 2020-12-03 21:34:03
我知道怎么回事了。在sklearn 0.22之前的版本中,传递给SVC的默认gamma参数是"auto",而在后续版本中,该参数被更改为"scale“。这篇文章的作者似乎一直在使用以前的版本,因此隐式地传递了gamma="auto" (他提到“当前gamma的默认设置是‘auto’”)。因此,如果您使用的是最新版本的sklearn (0.23.2),则需要在实例化SVC时显式地传递gamma='auto':
clf = SVC(kernel='sigmoid',gamma='auto')
#normalizing features
sepal_length_norm = normalize(sepal_length.reshape(1, -1))[0]
sepal_width_norm = normalize(sepal_width.reshape(1, -1))[0]
clf.fit(np.c_[sepal_length_norm, sepal_width_norm], iris.target)

因此,现在当您打印分类报告时:
pr_norm=clf.predict(np.c_[sepal_length_norm, sepal_width_norm])
print(pd.DataFrame(classification_report(iris.target, pr_norm, output_dict=True)))
# 0 1 2 accuracy macro avg weighted avg
# precision 0.907407 0.650000 0.750000 0.766667 0.769136 0.769136
# recall 0.980000 0.780000 0.540000 0.766667 0.766667 0.766667
# f1-score 0.942308 0.709091 0.627907 0.766667 0.759769 0.759769
# support 50.000000 50.000000 50.000000 0.766667 150.000000 150.000000可以解释你看到的33%的准确率是因为默认的gamma是"scale",然后把所有的预测放在决策平面的一个区域中,当目标被分成三部分时,你得到的最大准确率为33.3%:
clf = SVC(kernel='sigmoid')
#normalizing features
sepal_length_norm = normalize(sepal_length.reshape(1, -1))[0]
sepal_width_norm = normalize(sepal_width.reshape(1, -1))[0]
clf.fit(np.c_[sepal_length_norm, sepal_width_norm], iris.target)
X = np.c_[sepal_length_norm, sepal_width_norm]

pr_norm=clf.predict(np.c_[sepal_length_norm, sepal_width_norm])
print(pd.DataFrame(classification_report(iris.target, pr_norm, output_dict=True)))
# 0 1 2 accuracy macro avg weighted avg
# precision 0.0 0.0 0.333333 0.333333 0.111111 0.111111
# recall 0.0 0.0 1.000000 0.333333 0.333333 0.333333
# f1-score 0.0 0.0 0.500000 0.333333 0.166667 0.166667
# support 50.0 50.0 50.000000 0.333333 150.000000 150.000000https://stackoverflow.com/questions/65105704
复制相似问题