首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Sklearn排除一个交叉验证

使用Sklearn排除一个交叉验证
EN

Stack Overflow用户
提问于 2015-04-06 18:03:19
回答 1查看 2.5K关注 0票数 2

我试图使用交叉验证来测试我的分类器使用Sklearn。

我有三个班,总共有50个样本。

  • 第1类有:5个样本
  • 第2类有:15个样本
  • 第三类有:30个样品。

如下运行的预期,这大概是5倍交叉验证。

代码语言:javascript
复制
result = cross_validation.cross_val_score(classifier, X, y, cv=5)

我试着用cv=50折叠来做独占,所以我做了以下几点,

代码语言:javascript
复制
result = cross_validation.cross_val_score(classifier, X, y, cv=50)

然而,令人惊讶的是,它给出了以下错误:

代码语言:javascript
复制
/Library/Python/2.7/site-packages/sklearn/cross_validation.py:413: Warning: The least populated class in y has only 5 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=50.
  % (min_labels, self.n_folds)), Warning)
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/_methods.py:67: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "b.py", line 96, in <module>
    scores1 = cross_validation.cross_val_score(classifier, X, y, cv=50)
  File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line 1151, in cross_val_score
    for train, test in cv)
  File "/Library/Python/2.7/site-packages/sklearn/externals/joblib/parallel.py", line 653, in __call__
    self.dispatch(function, args, kwargs)
  File "/Library/Python/2.7/site-packages/sklearn/externals/joblib/parallel.py", line 400, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/Library/Python/2.7/site-packages/sklearn/externals/joblib/parallel.py", line 138, in __init__
    self.results = func(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line 1240, in _fit_and_score
    test_score = _score(estimator, X_test, y_test, scorer)
  File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line 1296, in _score
    score = scorer(estimator, X_test, y_test)
  File "/Library/Python/2.7/site-packages/sklearn/metrics/scorer.py", line 176, in _passthrough_scorer
    return estimator.score(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/sklearn/base.py", line 291, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "/Library/Python/2.7/site-packages/sklearn/neighbors/classification.py", line 147, in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "/Library/Python/2.7/site-packages/sklearn/neighbors/base.py", line 332, in kneighbors
    return_distance=return_distance)
  File "binary_tree.pxi", line 1307, in sklearn.neighbors.kd_tree.BinaryTree.query (sklearn/neighbors/kd_tree.c:10506)
  File "binary_tree.pxi", line 226, in sklearn.neighbors.kd_tree.get_memview_DTYPE_2D (sklearn/neighbors/kd_tree.c:2715)
  File "stringsource", line 247, in View.MemoryView.array_cwrapper (sklearn/neighbors/kd_tree.c:24789)
  File "stringsource", line 147, in View.MemoryView.array.__cinit__ (sklearn/neighbors/kd_tree.c:23664)
ValueError: Invalid shape in axis 0: 0.

另外,另一件奇怪的事情是,当我做cv=5时,我没有收到任何警告。当我做cv=50时,我会得到上面的警告,这是很奇怪的。因为我认为当cv变得更大时,即使在计算上可能比较困难,结果也应该更准确。和我的推理有什么差距吗?为什么我会收到警告和错误?

在这种情况下,我如何才能正确地保留一次交叉验证?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-04-06 18:06:41

默认情况下,用于分类的cv=5进行分层5倍交叉验证.这意味着它试图保持一个类别中样本的分数不变。这可能是因为当褶皱的数量和样品的数量相同时,就会产生麻烦。你的版本是哪一种?这个错误消息肯定没有多大帮助。

顺便说一句,一般来说,我建议您对这么小的数据集使用StratifiedShuffleSplit

编辑:当前版本提供了一个警告,这可能是一个错误:

is :399:警告:y中人口最少的类只有13个成员,这太少了。任何类的最小标签数都不能少于n_folds=68。% (min_labels,self.n_folds),警告)

票数 5
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/29476807

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档