首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用不同参数返回相同分数的LightGBM模型

使用不同参数返回相同分数的LightGBM模型
EN

Stack Overflow用户
提问于 2020-06-18 03:11:18
回答 1查看 370关注 0票数 2

我正在尝试在Kaggle Iowa住房数据集上训练一个LightGBM模型,我写了一个小脚本,在给定的范围内随机尝试不同的参数。我不确定我的代码出了什么问题,但是脚本用不同的参数返回相同的分数,这是不应该发生的。我在Catboost上尝试了相同的脚本,它可以正常工作,所以我猜问题出在LGBM上。

代码:

代码语言:javascript
复制
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from random import choice, randrange, uniform 

complete_train = pd.read_csv(
    "train.csv",
    encoding = "UTF-8", 
    index_col = "Id")

complete_test = pd.read_csv(
    "test.csv",
    encoding = "UTF-8",
    index_col = "Id")

def encode_impute(*datasets):
    for dataset in datasets:
        for column in dataset.columns:
            dataset[
                column].fillna(
                -999,
                inplace = True)
            if dataset[
                column].dtype ==  "object":
                dataset[
                    column] = dataset[
                    column].astype("category", copy = False)
encode_impute(
    complete_train, 
    complete_test)

X = complete_train.drop(
    columns = "SalePrice")

y = complete_train[
    "SalePrice"]

X_train, X_valid, y_train, y_valid = train_test_split(X, y)

def objective():

    while True:

        params = {
            "boosting_type": choice(["gbdt", "goss", "dart", "rf"]),
            "num_leaves": randrange(10000),
            "learning_rate": uniform(0.01, 1),
            "subsample_for_bin": randrange(100000000),
            "min_data_in_leaf": randrange(100000000),
            "reg_alpha": uniform(0, 1),
            "reg_lambda": uniform(0, 1),
            "feature_fraction": uniform(0, 1),
            "bagging_fraction": uniform(0, 1),
            "bagging_freq": randrange(1, 100)}

        params["bagging_fraction"] = 1.0 if params[
                        "boosting_type"] == "goss" else params[
                        "bagging_fraction"]

        model = LGBMRegressor().set_params(**params)

        model.fit(X_train, y_train)

        predictions = model.predict(X_valid)

        error_rate = mean_absolute_error(
          y_valid, predictions)

        print(f"Score = {error_rate} with parameters: {params}","\n" *5)

objective()

我得到的输出示例:

分数= 55967.70375930444,参数:{'boosting_type':'gbdt','num_leaves':6455,'learning_rate':0.2479700848039991,'subsample_for_bin':83737077,'min_data_in_leaf':51951103,'reg_alpha':0.1856001984332697,'reg_lambda':0.7849262049058852,'feature_fraction':0.10550627738309537,'bagging_fraction':0.2613298736131875,'bagging_freq':96}

分数= 55967.70375930444,参数:{'boosting_type':'dart','num_leaves':9678,'learning_rate':0.28670432435369037,'subsample_for_bin':24246091,'min_data_in_leaf':559094,'reg_alpha':0.07261459695501371,'reg_lambda':0.8834743560240725,'feature_fraction':0.5361519020265366,'bagging_freq':0.9120030047714073,‘bagging_freq’:10}

分数= 55967.70375930444,参数:{'boosting_type':'goss','num_leaves':4898,'learning_rate':0.09237499846487345,'subsample_for_bin':32620066,'min_data_in_leaf':71317820,'reg_alpha':0.9818297737748625,'reg_lambda':0.11638265354331834,'feature_fraction':0.4230342728468828,'bagging_fraction':1.0,‘bin_freq’:64}

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-18 14:56:55

我要指出的是,所有选项中的min_data_in_leaf参数似乎都很高,我怀疑模型没有学习到任何东西,只是发送只有根节点的响应变量的平均值。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62436635

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档