文章/答案/技术大牛

发布

社区首页 >问答首页 >精度为0%的SVC分类器

问精度为0%的SVC分类器
EN

Stack Overflow用户

提问于 2017-06-20 07:54:42

回答 1查看 665关注 0票数 2

我用Sklearn的SVC来区分不同的矩阵。数据为95个相关矩阵，由精神分裂症患者的IRM (50个矩阵)和健康对照者(45个矩阵)计算。他们是相当大(264*264)，所以我没有期待完美的结果，但0%的准确性似乎真的很低。

数据： 95矩阵264*264，值为-1,1

代码

以下是代码：

## Datas
#control_matrices: list of 45 matrices
#patient_matrices: list of 50 matrices

n_training = 25 #Number of matrices of control to train SVC (25 control and 25 patient)
indices = np.triu_indices(264,1) #Since the matrices are symetric, I just take the upper triangle

perm_control = np.random.permutation(45) #Doing a permutation to take random matrices for training
contr_matrices = control_matrices[perm_control] #control_matrices is a list of matrices
perm_patient = np.random.permutation(50) #Same with the patient matrices
pat_matrices = patient_matrices[perm_patient] 

x_control = [m[indices] for m in contr_matrices[:n_training]] #Data for training
x_patient = [m[indices] for m in pat_matrices[:n_training]]

test_control = [m[indices] for m in contr_matrices[n_training:]] #Data for test once the SVM is trained
test_patient = [m[indices] for m in pat_matrices[n_training:]]

X = np.concatenate((x_control, x_patient))
Y = np.asarray( n_training*[0] + n_training*[1] ) #Control: 0 - Patient: 1


perm = np.random.permutation(50)
X = X[perm]
Y = Y[perm]

## Training

clf = SVC()
clf.fit(X,Y)

预期结果

由于与矩阵的数量相比，数据的大小是巨大的，所以我希望得到较低的结果(略好于50%)。

实际结果

clf.score(np.concatenate((test_control, test_patient)), 20*[0]+25*[1])

>>> 0.0

每当我运行代码(所以，具有不同的排列)和n_training从10到45时，都会发生同样的情况。然而，SVC确实很好地记住了第一个矩阵，用于训练(clf.score(X,Y)是1.0)。

其他尝试

clf=LinearSVC()和clf=LogisticRegression()也是如此。

我也尝试过这样做，结果完全一样：

from sklearn.cross_validation import StratifiedKFold, cross_val_score
from nilearn import connectome

connectivity_coefs = connectome.sym_to_vec(matrices, ConnectivityMeasure) 
# This turns the matrices to a list of vectors

Y = 45*[0] + 50*[1]

cv = StratifiedKFold(Y, n_folds=3, shuffle=True)
svc = LinearSVC()

cv_scores = cross_val_score(svc, connectivity_coefs, Y, cv=cv, scoring='accuracy')

print('Score: %1.2f +- %1.2f' % (cv_scores.mean(), cv_scores.std()))

>>> Score: 0.00 +- 0.00

我还尝试使用更简单的数据:控制矩阵[0]和病人[1]。SVC工作得很好，所以我首先怀疑它与我使用的矩阵的大小有关(体积很大，样本很少)。

但是有了matrices = np.random.rand(95,264,264)，我得到了Score: 0.58 +- 0.03。

用完全矩阵代替上三角，我仍然可以得到0%的精度。

我完全不明白这里发生了什么。

版本

Windows-8-6.2.9200
Python 3.4.1 |Continuum Analytics, Inc.| (default, May 19 2014, 13:02:30) [MSC v.1600 64 bit (AMD64)]
NumPy 1.9.1
SciPy 0.15.1
Scikit-Learn 0.15.2

数据

下面是获取我使用的矩阵的完整代码(从打开的数据集获得IRM)：

from nilearn import datasets
from nilearn import input_data
from nilearn.connectome import ConnectivityMeasure
import numpy as np
from sklearn.svm import SVC, LinearSVC
from sklearn.cross_validation import StratifiedKFold, cross_val_score
from nilearn import connectome


## Atlas for the parcellation and Dataset

power = datasets.fetch_coords_power_2011()
coords = np.vstack((power.rois['x'], power.rois['y'], power.rois['z'])).T
datas = datasets.fetch_cobre(n_subjects=None, verbose=0)

spheres_masker = input_data.NiftiSpheresMasker(
                    seeds=coords, smoothing_fwhm=4, radius=5.,
                    detrend=True, standardize=True,
                    high_pass=0.01, t_r=2, verbose=0)


## Extracting useful IRM

list_time_series = []
i = 0
for fmri_filenames, confounds_file in zip(datas.func, datas.confounds): #Might take a few minutes
    print("Sujet %s" % i)
    if i != 38 and i != 41: #Subjects removed from the study
        conf = np.genfromtxt(confounds_file)
        conf = np.delete(conf, obj = 16, axis = 1) #Remove Global Signal
        conf = np.delete(conf, obj = 0, axis = 0) #Remove labels
        scrub = [i for i in range(150) if conf[i,7]==1]    
        conf = np.delete(conf, obj = 7, axis = 1) #Remove Scrub
        if len(scrub) < 90: #Keep at least 60 non scrub
            time_series = spheres_masker.fit_transform(fmri_filenames, confounds=conf)
            time_series = np.delete(time_series, obj = scrub, axis = 0) #Remove scrub
            list_time_series.append(time_series)
        else:
            list_time_series.append([])
    else:
        list_time_series.append([])
    i+=1


## Computing correlation matrices

N = len(datas.phenotypic)
control_subjects = []
patient_subjects = []
for i in range(N):
    t = list_time_series[i]
    if type(t) != list :
        subject = datas.phenotypic[i]
        if str(subject[4])=='b\'Control\'':
            control_subjects.append(t)
        else:
            patient_subjects.append(t)
control_subjects = np.asarray(control_subjects)            
patient_subjects = np.asarray(patient_subjects)

connect_measure = ConnectivityMeasure(kind='tangent')
control_matrices=connect_measure.fit_transform(control_subjects)
patient_matrices=connect_measure.fit_transform(patient_subjects)

matrices = np.concatenate((control_matrices, patient_matrices))

或者你可以下载他们这里。

谢谢你的帮忙!

python

scikit-learn

回答 1

Stack Overflow用户

发布于 2017-06-20 09:44:17

与其使用["Control"]和["Patient"]作为输出标签，不如将输出标签分配给数字(例如，Control作为0，Patient作为1)，因为ML算法只处理实数。所以

Y = np.asarray( n_training*["Control"] + n_training*["Patient"] )

应该是

Y = np.asarray( n_training*[0] + n_training*[1] )

和

clf.score(np.concatenate((test_control, test_patient)), 20*['Control']+25*['Patient'])

应该是

clf.score(np.concatenate((test_control, test_patient)), np.asarray( 20*[0] + 25*[1] ))

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/44647076

复制

相似问题

问精度为0%的SVC分类器
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问精度为0%的SVC分类器EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问精度为0%的SVC分类器
EN