问检查模型是否过拟合/不合适
EN

Stack Overflow用户

提问于 2022-08-16 05:07:43

回答 2查看 25关注 0票数 0

我知道，为了检验模型是否过拟合，我们需要获得训练和测试数据集的分数，并对它们进行比较。问题是如何将其转化为编码？因为我是第一次这样做，所以我做了一些搜索，在这里我遇到了一个古老的答案：(验证过拟合或过拟合)。

注意:在原来的答案from sklearn.cross_validation import KFold from sklearn.model_selection import KFold 中，我已将这一行替换为：from sklearn.model_selection import KFold

然而，当我运行这个程序时，我会得到下面的错误。怎么解决这个问题？

X_normalized, y_for_normalized = scaled_df[[ "Part's Z-Height (mm)","Part's Solid Volume (cm^3)","Layer Height (mm)","Printing/Scanning Speed (mm/s)","Part's Orientation (Support's volume) (cm^3)"]], scaled_df [["Climate change (kg CO2 eq.)","Climate change, incl biogenic carbon (kg CO2 eq.)","Fine Particulate Matter Formation (kg PM2.5 eq.)","Fossil depletion (kg oil eq.)","Freshwater Consumption (m^3)","Freshwater ecotoxicity (kg 1,4-DB eq.)","Freshwater Eutrophication (kg P eq.)","Human toxicity, cancer (kg 1,4-DB eq.)","Human toxicity, non-cancer (kg 1,4-DB eq.)","Ionizing Radiation (Bq. C-60 eq. to air)","Land use (Annual crop eq. yr)","Marine ecotoxicity (kg 1,4-DB eq.)","Marine Eutrophication (kg N eq.)","Metal depletion (kg Cu eq.)","Photochemical Ozone Formation, Ecosystem (kg NOx eq.)","Photochemical Ozone Formation, Human Health (kg NOx eq.)","Stratospheric Ozone Depletion (kg CFC-11 eq.)","Terrestrial Acidification (kg SO2 eq.)","Terrestrial ecotoxicity (kg 1,4-DB eq.)"]]


new_model = DecisionTreeRegressor(max_depth=9,
                                  min_samples_split=10,random_state=0)




   import numpy as np
from sklearn.metrics import SCORERS
from sklearn.model_selection import KFold

scorer = SCORERS['r2']
cv = KFold(5)
train_scores, test_scores = [], []
for train, test in cv.split(X_normalized):
    new_model.fit(X[train], y[train])
    train_scores.append(scorer(new_model, X[train], y[train]))
    test_scores.append(scorer(new_model, X[test], y[test]))

mean_train_score = np.mean(train_scores)
mean_test_score = np.mean(test_scores)





---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_73165/4218536717.py in <module>
      7 train_scores, test_scores = [], []
      8 for train, test in cv.split(X_normalized):
----> 9     new_model.fit(X[train], y[train])
     10     train_scores.append(scorer(new_model, X[train], y[train]))
     11     test_scores.append(scorer(new_model, X[test], y[test]))

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3462             if is_iterator(key):
   3463                 key = list(key)
-> 3464             indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
   3465 
   3466         # take() does not accept boolean indexers

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis)
   1312             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1313 
-> 1314         self._validate_read_indexer(keyarr, indexer, axis)
   1315 
   1316         if needs_i8_conversion(ax.dtype) or isinstance(

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis)
   1372                 if use_interval_msg:
   1373                     key = list(key)
-> 1374                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1375 
   1376             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())

KeyError: "None of [Int64Index([20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,\n            37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,\n            54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,\n            71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,\n            88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],\n           dtype='int64')] are in the [columns]"

python

回答 2

Stack Overflow用户

发布于 2022-08-16 05:27:08

根据用户手册，您需要调用cv.split(X)来获得迭代器：

for train, test in cv.split(X):
    regressor.fit(X[train], y[train])
    ...

还请记住，train和test是索引列表。对于numpy来说，这不是一个问题(从某种意义上说，用索引列表而不是单个索引来索引将产生您期望的结果)，但是对于常规的list或可索引的一般对象来说不是这样。

票数 0

Stack Overflow用户

发布于 2022-08-16 05:45:57

X_normalized, y_for_normalized = scaled_df[[ "Part's Z-Height (mm)","Part's Solid Volume (cm^3)","Layer Height (mm)","Printing/Scanning Speed (mm/s)","Part's Orientation (Support's volume) (cm^3)"]], scaled_df [["Climate change (kg CO2 eq.)","Climate change, incl biogenic carbon (kg CO2 eq.)","Fine Particulate Matter Formation (kg PM2.5 eq.)","Fossil depletion (kg oil eq.)","Freshwater Consumption (m^3)","Freshwater ecotoxicity (kg 1,4-DB eq.)","Freshwater Eutrophication (kg P eq.)","Human toxicity, cancer (kg 1,4-DB eq.)","Human toxicity, non-cancer (kg 1,4-DB eq.)","Ionizing Radiation (Bq. C-60 eq. to air)","Land use (Annual crop eq. yr)","Marine ecotoxicity (kg 1,4-DB eq.)","Marine Eutrophication (kg N eq.)","Metal depletion (kg Cu eq.)","Photochemical Ozone Formation, Ecosystem (kg NOx eq.)","Photochemical Ozone Formation, Human Health (kg NOx eq.)","Stratospheric Ozone Depletion (kg CFC-11 eq.)","Terrestrial Acidification (kg SO2 eq.)","Terrestrial ecotoxicity (kg 1,4-DB eq.)"]]


new_model = DecisionTreeRegressor(max_depth=9,
                                  min_samples_split=10,random_state=0)

import numpy as np
from sklearn.metrics import SCORERS
from sklearn.model_selection import KFold

scorer = SCORERS['r2']

cv = KFold(5)
train_scores, test_scores = [], []

for train, test in cv.split(X_normalized):
    new_model.fit(X_normalized[train], y_for_normalized[train])
    train_scores.append(scorer(new_model, X_normalized[train], y_for_normalized[train]))
    test_scores.append(scorer(new_model, X_normalized[test], y_for_normalized[test]))

mean_train_score = np.mean(train_scores)
mean_test_score = np.mean(test_scores)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73368997

复制

相似问题

问检查模型是否过拟合/不合适
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问检查模型是否过拟合/不合适EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问检查模型是否过拟合/不合适
EN