首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >不平衡类上的Python XgBoost

不平衡类上的Python XgBoost
EN

Stack Overflow用户
提问于 2017-10-12 19:58:00
回答 2查看 1.6K关注 0票数 1

我目前正在做一个机器学习项目,旨在预测一个二进制类(负: 0,正: 1)。数据集不平衡。正值的比例为0.1%。

我正在运行一个xgboost模型,使用基尼系数作为性能度量。问题是,在助推迭代期间,它需要大量的运行来提高分数。

例如:

代码语言:javascript
复制
[Fold 1/2]
[0] train-gini:-0.048192    validation-gini:-0.042979
Multiple eval metrics have been passed: 'validation-gini' will be used for early stopping.

Will train until validation-gini hasn't improved in 200 rounds.
[10]    train-gini:-0.048192    validation-gini:-0.042979
[20]    train-gini:-0.048192    validation-gini:-0.042979
[30]    train-gini:-0.048192    validation-gini:-0.042979
[40]    train-gini:-0.048192    validation-gini:-0.042979
[50]    train-gini:-0.048192    validation-gini:-0.042979
[60]    train-gini:-0.048192    validation-gini:-0.042979
[70]    train-gini:-0.048192    validation-gini:-0.042979
[80]    train-gini:-0.048192    validation-gini:-0.042979
[90]    train-gini:0.197521 validation-gini:0.114222
[100]   train-gini:0.247692 validation-gini:0.150601
[110]   train-gini:0.2742   validation-gini:0.169023
[120]   train-gini:0.278983 validation-gini:0.168095
[130]   train-gini:0.316636 validation-gini:0.19118
[140]   train-gini:0.347296 validation-gini:0.191045
[150]   train-gini:0.368581 validation-gini:0.20094
[160]   train-gini:0.374773 validation-gini:0.20906
[170]   train-gini:0.398815 validation-gini:0.215193
[180]   train-gini:0.426088 validation-gini:0.220467
[190]   train-gini:0.439271 validation-gini:0.22249
[200]   train-gini:0.455897 validation-gini:0.226621
[210]   train-gini:0.469989 validation-gini:0.229512
[220]   train-gini:0.485784 validation-gini:0.233432
[230]   train-gini:0.496734 validation-gini:0.23747
[240]   train-gini:0.503718 validation-gini:0.241804
[250]   train-gini:0.51102  validation-gini:0.241841
[260]   train-gini:0.523444 validation-gini:0.244312
[270]   train-gini:0.530968 validation-gini:0.245467
[280]   train-gini:0.538703 validation-gini:0.247433
[290]   train-gini:0.546911 validation-gini:0.244196
[300]   train-gini:0.553623 validation-gini:0.244161
[310]   train-gini:0.561385 validation-gini:0.245099
[320]   train-gini:0.571532 validation-gini:0.244787
[330]   train-gini:0.578088 validation-gini:0.246146
[340]   train-gini:0.585054 validation-gini:0.245624
[350]   train-gini:0.591924 validation-gini:0.245463
[360]   train-gini:0.596331 validation-gini:0.247517
[370]   train-gini:0.600661 validation-gini:0.249465
[380]   train-gini:0.606264 validation-gini:0.249034
[390]   train-gini:0.611768 validation-gini:0.249182
[400]   train-gini:0.617176 validation-gini:0.248239
[410]   train-gini:0.621629 validation-gini:0.249248
[420]   train-gini:0.626766 validation-gini:0.24975
[430]   train-gini:0.631587 validation-gini:0.247824
[440]   train-gini:0.636737 validation-gini:0.246586
[450]   train-gini:0.641735 validation-gini:0.246552
[460]   train-gini:0.649765 validation-gini:0.246332
[470]   train-gini:0.654319 validation-gini:0.243546
[480]   train-gini:0.659301 validation-gini:0.241965
[490]   train-gini:0.665632 validation-gini:0.242562
[500]   train-gini:0.669333 validation-gini:0.241306
[510]   train-gini:0.673625 validation-gini:0.240314
[520]   train-gini:0.678935 validation-gini:0.239846
[530]   train-gini:0.683851 validation-gini:0.240029
[540]   train-gini:0.685694 validation-gini:0.240691
[550]   train-gini:0.689285 validation-gini:0.239974
[560]   train-gini:0.691698 validation-gini:0.239079
[570]   train-gini:0.694017 validation-gini:0.239407
Stopping. Best iteration:
[373]   train-gini:0.60227  validation-gini:0.24996

我们可以看到,在第80轮,训练和验证的分数最终得到了提高。即使我改变了分割的种子,这种情况也会重复(但分数增加的这一轮的n°将改变)。

有没有人遇到过这样的问题?

干杯,Astrus

EN

回答 2

Stack Overflow用户

发布于 2018-02-06 19:13:37

不是的。但是,如果只有0.1%的正值,您可能需要尝试scale_pos_weight : float参数的xgboost值

也许它能解决这个问题。我会这么说:

代码语言:javascript
复制
scale_pos_weight = 1000
票数 1
EN

Stack Overflow用户

发布于 2018-02-08 23:29:14

你有没有尝试过按照xgboost documentationeval_metric更改为loglosserror

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46709026

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档