问TypeError：init()在癌症数据集中为参数'n_splits‘获取了多个值
EN

Stack Overflow用户

提问于 2020-06-17 22:16:10

回答 1查看 153关注 0票数 0

数据集

Id,Cl.thickness,Cell.size,Cell.shape,Marg.adhesion,Epith.c.size,Bare.nuclei,Bl.cromatin,Normal.nucleoli,Mitoses,Class
1000025,5,1,1,1,2,1,3,1,1,benign
1002945,5,4,4,5,7,10,3,2,1,benign

代码如下

import math
import numpy as np
import pandas as pd
#from sklearn.grid_search import GridSearchCV
from sklearn.model_selection import learning_curve,GridSearchCV
from sklearn.linear_model import LogisticRegressionCV
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_val_score, cross_val_predict, StratifiedKFold 
from sklearn import preprocessing, metrics, svm, ensemble
from sklearn.metrics import accuracy_score, classification_report
import tabpy_client 
# Breast Cancer dataset
# Citation: Dr. William H. Wolberg, University of Wisconsin Hospitals, Madison 
# https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)

# Read the dataset (Note that the CSV provided for this demo has rows with the missing data removed)
df =  pd.read_csv('breastcancer.csv', header=0)

# Take a look at the structure of the file
df.head(n=4)
# Drop Id column not used in analysis
df.drop(['Id'], 1, inplace=True)

# Use LabelEncoder to convert textual classifications to numeric. 
# We will use the same encoder later to convert them back.
encoder = preprocessing.LabelEncoder()
df['Class'] = encoder.fit_transform(df['Class'])

# You could also do this manually in the following way:
# df['Class'] = df['Class'].map( {'benign': 0, 'malignant': 1} ).astype(int)

# Check the result of the transform
df.head(n=6)
# Split columns into independent/predictor variables vs dependent/response/outcome variable
X = np.array(df.drop(['Class'], 1))
y = np.array(df['Class'])

# Scale the data. We will use the same scaler later for scoring function
scaler = preprocessing.StandardScaler().fit(X)
X = scaler.transform(X)

# 10 fold stratified cross validation
kf = StratifiedKFold(y,n_splits=10, random_state=None, shuffle=True)

# Define the parameter grid to use for tuning the Support Vector Machine
parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

# Pick the goal you're optimizing for e.g. precision if you prefer fewer false-positives
# recall if you prefer fewer false-negatives. For demonstration purposes let's pick several
# Note that the final model selection will be based on the last item in the list
scoringmethods = ['f1','accuracy','precision', 'recall','roc_auc']

为什么n_splits抛出错误

TypeError: __init__() got multiple values for argument 'n_splits'.

n_splits是网格搜索中的参数

python

scikit-learn

回答 1

Stack Overflow用户

发布于 2020-06-17 23:00:42

您不会在构造函数中将数据传递给sklearn模型实例。这是来自https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html:的签名

StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)

您会得到这个特定的错误，因为python将y数组解释为n_splits参数。至于拆分，请查看文档中的方法。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62431127

复制

相似问题

问TypeError：init()在癌症数据集中为参数'n_splits‘获取了多个值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问TypeError：__init__()在癌症数据集中为参数'n_splits‘获取了多个值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问TypeError：init()在癌症数据集中为参数'n_splits‘获取了多个值
EN