问XGBoost生成的树并不像num_round参数中指定的那样多
EN

Stack Overflow用户

提问于 2017-08-16 15:27:28

回答 1查看 435关注 0票数 2

这不是一个bug，而是一个需要理解的问题。当我从Booster对象调用getModelDump时，我没有在"num_round“参数中得到那么多的树。我在想，如果"num_round“是100，那么XGBoost将按顺序生成100棵树，当我调用getModelDump时，我将看到所有这些树。我确信这背后有逻辑上的原因，或者我的知识是错误的。你能解释一下这种情况吗？

val paramMap = List(
      "eta" -> 0.1, "max_depth" -> 7, "objective" -> "binary:logistic", "num_round" ->100,
      "eval_metric" -> "auc", "nworkers" -> 8).toMap
    val xgboostEstimator = new XGBoostEstimator(paramMap)
//TrainModel is another set of standard Spark features like StringIndexer, OnehotEncoding and VectorAssembler
    val pipelineXGBoost = new Pipeline().setStages(Array(trainModel, xgboostEstimator))
    val cvModel = pipelineXGBoost.fit(train)
//Below call generates only 2 tree instead of 100 as num_round is 100!!!
    println(cvModel.stages(1).asInstanceOf[XGBoostClassificationModel].booster.getModelDump()(0))

指向问题https://github.com/dmlc/xgboost/issues/2610的Github链接

使用scala 2.11的版本如下

  "ml.dmlc" % "xgboost4j" % "0.7",
  "ml.dmlc" % "xgboost4j-spark" % "0.7",
  "org.apache.spark" %% "spark-core" % "2.2.0",
  "org.apache.spark" %% "spark-sql" % "2.2.0",
  "org.apache.spark" %% "spark-graphx" % "2.2.0",
  "org.apache.spark" %% "spark-mllib" % "2.2.0",

apache-spark

xgboost

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-08-17 03:43:07

我没有得到(0..num_round)从getModelDump的结果中提取。每个索引对应于另一棵树。

在链接https://github.com/dmlc/xgboost/issues/2610中回答

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45707479

复制

相似问题

问XGBoost生成的树并不像num_round参数中指定的那样多
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问XGBoost生成的树并不像num_round参数中指定的那样多EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问XGBoost生成的树并不像num_round参数中指定的那样多
EN