文章/答案/技术大牛

发布

问轻型GBM回归CV解释结果
EN

Stack Overflow用户

提问于 2021-04-08 06:09:41

回答 1查看 208关注 0票数 1

我已经看了文档，但没有找到我的问题的答案，希望这里的人知道。下面是一些示例代码：

N_FOLDS= 5

model = lgb.LGBMClassifier()
default_params = model.get_params()

#overwriting a param
default_params['objective'] = 'regression'

cv_results = lgb.cv(default_params, train_set, num_boost_round = 100000, nfold = N_FOLDS, 
                    early_stopping_rounds = 100, metrics = 'rmse', seed = 50, stratified=False)

我得到了这样一个字典，它有6个不同的值：

{'rmse-mean': [635.2078190031074,
  632.0847253839236,
  629.6661071275558,
  627.9721515847672,
  626.6712284533291,
  625.293530527769],
 'rmse-stdv': [197.5088741303537,
  198.66960690389863,
  199.56134068525006,
  200.25929541235243,
  200.8251430042537,
  201.50213772830526]}

起初，我认为该字典中的值对应于每个折叠的RMSE (在本例中为5)，但似乎并非如此。字典看起来像是按RMSE值递减的。

有人知道每个值对应的是什么吗？

python

machine-learning

regression

cross-validation

lightgbm

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-04-11 21:51:52

它不对应于折叠，而是对应于每轮助推的cv结果(所有测试折叠的RMSE均值)，如果我们只说5轮并打印每轮的结果，你可以非常清楚地看到这一点：

import lightgbm as lgb
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
train_set = lgb.Dataset(X,label = y)

params = {'learning_rate': 0.05,'num_leaves': 4,'subsample': 0.5}

cv_results = lgb.cv(params, train_set, num_boost_round = 5, nfold = N_FOLDS, verbose_eval  = True,
                    early_stopping_rounds = None, metrics = 'rmse', seed = 50, stratified=False)

[LightGBM] [Info] Total Bins 1251
[LightGBM] [Info] Number of data points in the train set: 404, number of used features: 13
[LightGBM] [Info] Start training from score 22.585149
[LightGBM] [Info] Start training from score 22.109406
[LightGBM] [Info] Start training from score 22.579703
[LightGBM] [Info] Start training from score 22.784158
[LightGBM] [Info] Start training from score 22.599010
[1] cv_agg's rmse: 8.86903 + 0.88135
[2] cv_agg's rmse: 8.58355 + 0.860252
[3] cv_agg's rmse: 8.31477 + 0.842578
[4] cv_agg's rmse: 8.06201 + 0.82627
[5] cv_agg's rmse: 7.8268 + 0.800053

import pandas as pd
pd.DataFrame(cv_results)

    rmse-mean   rmse-stdv
0   8.869030    0.881350
1   8.583552    0.860252
2   8.314774    0.842578
3   8.062014    0.826270
4   7.826800    0.800053

在你的帖子中，你设置了early_stopping_rounds = 100，并使用了默认的learning rate = 0.1，根据你的数据可能有点高，所以它很有可能在6轮后停止。

使用上面的相同示例，您可以看到，如果我们设置了early_stopping_rounds = 100，它将每100轮评估一次度量改进，并在停止之前返回100轮的结果：

cv_results = lgb.cv(params, train_set, num_boost_round = 2000, nfold = N_FOLDS, 
verbose_eval  = True,early_stopping_rounds = 100, metrics = 'rmse',
seed = 50, stratified=False)

[...]
[1475]  cv_agg's rmse: 3.20605 + 0.50213
[1476]  cv_agg's rmse: 3.20616 + 0.501997
[1477]  cv_agg's rmse: 3.20607 + 0.501998
[1478]  cv_agg's rmse: 3.20636 + 0.501865
[1479]  cv_agg's rmse: 3.20631 + 0.501905
[1480]  cv_agg's rmse: 3.20633 + 0.501731
[1481]  cv_agg's rmse: 3.20659 + 0.501494
[1482]  cv_agg's rmse: 3.2068 + 0.502046
[1483]  cv_agg's rmse: 3.20687 + 0.50213
[1484]  cv_agg's rmse: 3.20701 + 0.502265
[1485]  cv_agg's rmse: 3.20717 + 0.502096
[1486]  cv_agg's rmse: 3.2072 + 0.501779
[1487]  cv_agg's rmse: 3.20722 + 0.501613
[1488]  cv_agg's rmse: 3.20718 + 0.501308
[1489]  cv_agg's rmse: 3.20701 + 0.501232

pd.DataFrame(cv_results).shape
(1389, 2)

如果你想从你的模型中得到rmse的估计值，取最后一个值。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66994779

复制

相似问题

问轻型GBM回归CV解释结果
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问轻型GBM回归CV解释结果EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问轻型GBM回归CV解释结果
EN