首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >TypeError:__init__()在癌症数据集中为参数'n_splits‘获取了多个值

TypeError:__init__()在癌症数据集中为参数'n_splits‘获取了多个值
EN

Stack Overflow用户
提问于 2020-06-17 22:16:10
回答 1查看 153关注 0票数 0

数据集

代码语言:javascript
复制
Id,Cl.thickness,Cell.size,Cell.shape,Marg.adhesion,Epith.c.size,Bare.nuclei,Bl.cromatin,Normal.nucleoli,Mitoses,Class
1000025,5,1,1,1,2,1,3,1,1,benign
1002945,5,4,4,5,7,10,3,2,1,benign

代码如下

代码语言:javascript
复制
import math
import numpy as np
import pandas as pd
#from sklearn.grid_search import GridSearchCV
from sklearn.model_selection import learning_curve,GridSearchCV
from sklearn.linear_model import LogisticRegressionCV
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_val_score, cross_val_predict, StratifiedKFold 
from sklearn import preprocessing, metrics, svm, ensemble
from sklearn.metrics import accuracy_score, classification_report
import tabpy_client 
# Breast Cancer dataset
# Citation: Dr. William H. Wolberg, University of Wisconsin Hospitals, Madison 
# https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)

# Read the dataset (Note that the CSV provided for this demo has rows with the missing data removed)
df =  pd.read_csv('breastcancer.csv', header=0)

# Take a look at the structure of the file
df.head(n=4)
# Drop Id column not used in analysis
df.drop(['Id'], 1, inplace=True)

# Use LabelEncoder to convert textual classifications to numeric. 
# We will use the same encoder later to convert them back.
encoder = preprocessing.LabelEncoder()
df['Class'] = encoder.fit_transform(df['Class'])

# You could also do this manually in the following way:
# df['Class'] = df['Class'].map( {'benign': 0, 'malignant': 1} ).astype(int)

# Check the result of the transform
df.head(n=6)
# Split columns into independent/predictor variables vs dependent/response/outcome variable
X = np.array(df.drop(['Class'], 1))
y = np.array(df['Class'])

# Scale the data. We will use the same scaler later for scoring function
scaler = preprocessing.StandardScaler().fit(X)
X = scaler.transform(X)

# 10 fold stratified cross validation
kf = StratifiedKFold(y,n_splits=10, random_state=None, shuffle=True)

# Define the parameter grid to use for tuning the Support Vector Machine
parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

# Pick the goal you're optimizing for e.g. precision if you prefer fewer false-positives
# recall if you prefer fewer false-negatives. For demonstration purposes let's pick several
# Note that the final model selection will be based on the last item in the list
scoringmethods = ['f1','accuracy','precision', 'recall','roc_auc']

为什么n_splits抛出错误

代码语言:javascript
复制
TypeError: __init__() got multiple values for argument 'n_splits'. 

n_splits是网格搜索中的参数

EN

回答 1

Stack Overflow用户

发布于 2020-06-17 23:00:42

您不会在构造函数中将数据传递给sklearn模型实例。这是来自https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html:的签名

代码语言:javascript
复制
StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)

您会得到这个特定的错误,因为python将y数组解释为n_splits参数。至于拆分,请查看文档中的方法。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62431127

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档