首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >检查模型是否过拟合/不合适

检查模型是否过拟合/不合适
EN

Stack Overflow用户
提问于 2022-08-16 05:07:43
回答 2查看 25关注 0票数 0

我知道,为了检验模型是否过拟合,我们需要获得训练和测试数据集的分数,并对它们进行比较。问题是如何将其转化为编码?因为我是第一次这样做,所以我做了一些搜索,在这里我遇到了一个古老的答案:(验证过拟合或过拟合)。

注意:在原来的答案from sklearn.cross_validation import KFold from sklearn.model_selection import KFold 中,我已将这一行替换为:from sklearn.model_selection import KFold

然而,当我运行这个程序时,我会得到下面的错误。怎么解决这个问题?

代码语言:javascript
复制
X_normalized, y_for_normalized = scaled_df[[ "Part's Z-Height (mm)","Part's Solid Volume (cm^3)","Layer Height (mm)","Printing/Scanning Speed (mm/s)","Part's Orientation (Support's volume) (cm^3)"]], scaled_df [["Climate change (kg CO2 eq.)","Climate change, incl biogenic carbon (kg CO2 eq.)","Fine Particulate Matter Formation (kg PM2.5 eq.)","Fossil depletion (kg oil eq.)","Freshwater Consumption (m^3)","Freshwater ecotoxicity (kg 1,4-DB eq.)","Freshwater Eutrophication (kg P eq.)","Human toxicity, cancer (kg 1,4-DB eq.)","Human toxicity, non-cancer (kg 1,4-DB eq.)","Ionizing Radiation (Bq. C-60 eq. to air)","Land use (Annual crop eq. yr)","Marine ecotoxicity (kg 1,4-DB eq.)","Marine Eutrophication (kg N eq.)","Metal depletion (kg Cu eq.)","Photochemical Ozone Formation, Ecosystem (kg NOx eq.)","Photochemical Ozone Formation, Human Health (kg NOx eq.)","Stratospheric Ozone Depletion (kg CFC-11 eq.)","Terrestrial Acidification (kg SO2 eq.)","Terrestrial ecotoxicity (kg 1,4-DB eq.)"]]


new_model = DecisionTreeRegressor(max_depth=9,
                                  min_samples_split=10,random_state=0)




   import numpy as np
from sklearn.metrics import SCORERS
from sklearn.model_selection import KFold

scorer = SCORERS['r2']
cv = KFold(5)
train_scores, test_scores = [], []
for train, test in cv.split(X_normalized):
    new_model.fit(X[train], y[train])
    train_scores.append(scorer(new_model, X[train], y[train]))
    test_scores.append(scorer(new_model, X[test], y[test]))

mean_train_score = np.mean(train_scores)
mean_test_score = np.mean(test_scores)





---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_73165/4218536717.py in <module>
      7 train_scores, test_scores = [], []
      8 for train, test in cv.split(X_normalized):
----> 9     new_model.fit(X[train], y[train])
     10     train_scores.append(scorer(new_model, X[train], y[train]))
     11     test_scores.append(scorer(new_model, X[test], y[test]))

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3462             if is_iterator(key):
   3463                 key = list(key)
-> 3464             indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
   3465 
   3466         # take() does not accept boolean indexers

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis)
   1312             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1313 
-> 1314         self._validate_read_indexer(keyarr, indexer, axis)
   1315 
   1316         if needs_i8_conversion(ax.dtype) or isinstance(

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis)
   1372                 if use_interval_msg:
   1373                     key = list(key)
-> 1374                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1375 
   1376             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())

KeyError: "None of [Int64Index([20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,\n            37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,\n            54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,\n            71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,\n            88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],\n           dtype='int64')] are in the [columns]"
EN

回答 2

Stack Overflow用户

发布于 2022-08-16 05:27:08

根据用户手册,您需要调用cv.split(X)来获得迭代器:

代码语言:javascript
复制
for train, test in cv.split(X):
    regressor.fit(X[train], y[train])
    ...

还请记住,traintest是索引列表。对于numpy来说,这不是一个问题(从某种意义上说,用索引列表而不是单个索引来索引将产生您期望的结果),但是对于常规的list或可索引的一般对象来说不是这样。

票数 0
EN

Stack Overflow用户

发布于 2022-08-16 05:45:57

代码语言:javascript
复制
X_normalized, y_for_normalized = scaled_df[[ "Part's Z-Height (mm)","Part's Solid Volume (cm^3)","Layer Height (mm)","Printing/Scanning Speed (mm/s)","Part's Orientation (Support's volume) (cm^3)"]], scaled_df [["Climate change (kg CO2 eq.)","Climate change, incl biogenic carbon (kg CO2 eq.)","Fine Particulate Matter Formation (kg PM2.5 eq.)","Fossil depletion (kg oil eq.)","Freshwater Consumption (m^3)","Freshwater ecotoxicity (kg 1,4-DB eq.)","Freshwater Eutrophication (kg P eq.)","Human toxicity, cancer (kg 1,4-DB eq.)","Human toxicity, non-cancer (kg 1,4-DB eq.)","Ionizing Radiation (Bq. C-60 eq. to air)","Land use (Annual crop eq. yr)","Marine ecotoxicity (kg 1,4-DB eq.)","Marine Eutrophication (kg N eq.)","Metal depletion (kg Cu eq.)","Photochemical Ozone Formation, Ecosystem (kg NOx eq.)","Photochemical Ozone Formation, Human Health (kg NOx eq.)","Stratospheric Ozone Depletion (kg CFC-11 eq.)","Terrestrial Acidification (kg SO2 eq.)","Terrestrial ecotoxicity (kg 1,4-DB eq.)"]]


new_model = DecisionTreeRegressor(max_depth=9,
                                  min_samples_split=10,random_state=0)

import numpy as np
from sklearn.metrics import SCORERS
from sklearn.model_selection import KFold

scorer = SCORERS['r2']

cv = KFold(5)
train_scores, test_scores = [], []

for train, test in cv.split(X_normalized):
    new_model.fit(X_normalized[train], y_for_normalized[train])
    train_scores.append(scorer(new_model, X_normalized[train], y_for_normalized[train]))
    test_scores.append(scorer(new_model, X_normalized[test], y_for_normalized[test]))

mean_train_score = np.mean(train_scores)
mean_test_score = np.mean(test_scores)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73368997

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档