文章/答案/技术大牛

发布

社区首页 >问答首页 >Python学习RandomForestClassifier不可复制的结果

问Python学习RandomForestClassifier不可复制的结果
EN

Stack Overflow用户

提问于 2017-11-22 11:46:19

回答 1查看 10K关注 0票数 6

我一直在使用sklearn的随机森林，我试着比较几种模型。然后我注意到，随机森林给出了不同的结果，即使有相同的种子.我尝试了两种方法: random.seed( 1234 )以及使用随机森林内置的random_state =1234，在这两种情况下，我得到了不可重复的结果。我错过了什么？

# 1
random.seed(1234)
RandomForestClassifier(max_depth=5, max_features=5, criterion='gini', min_samples_leaf = 10)
# or 2
RandomForestClassifier(max_depth=5, max_features=5, criterion='gini', min_samples_leaf = 10, random_state=1234)

有什么想法吗？谢谢！！

编辑:添加更完整的代码版本

clf = RandomForestClassifier(max_depth=60, max_features=60, \
                        criterion='entropy', \
                        min_samples_leaf = 3, random_state=seed)
# As describe, I tried random_state in several ways, still diff results
clf = clf.fit(X_train, y_train)

predicted = clf.predict(X_test)
predicted_prob = clf.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = metrics.roc_curve(np.array(y_test), predicted_prob)
auc = metrics.auc(fpr,tpr)
print (auc)

编辑:已经有很长一段时间了，但是我认为使用RandomState可以解决这个问题。我自己还没试过，但如果你在读，值得一试。而且，通常最好使用RandomState而不是random.seed()。

machine-learning

random

random-forest

reproducible-research

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-11-22 15:10:29

首先，确保您拥有所需模块的最新版本(例如，scipy、numpy等)。当输入random.seed(1234)时，使用numpy生成器。

在random_state中使用RandomForestClassifier参数时，有几个选项：int、RandomState instance或None。

来自docs 这里：

如果是int，则random_state是随机数生成器使用的种子；
如果是RandomState实例，则random_state是随机数生成器；
如果没有，则随机数生成器是由RandomState使用的np.random实例。

一种在两种情况下使用相同生成器的方法如下：I在两种情况下使用相同的(numpy)生成器，I获得可重复的结果(两种情况下的结果相同)。

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from numpy import *

X, y = make_classification(n_samples=1000, n_features=4,
                       n_informative=2, n_redundant=0,
                       random_state=0, shuffle=False)

random.seed(1234)
clf = RandomForestClassifier(max_depth=2)
clf.fit(X, y)

clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234))
clf2.fit(X, y)

检查结果是否相同：

all(clf.predict(X) == clf2.predict(X))
#True

运行相同代码5次后进行检查：

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from numpy import *

for i in range(5):

    X, y = make_classification(n_samples=1000, n_features=4,
                       n_informative=2, n_redundant=0,
                       random_state=0, shuffle=False)

    random.seed(1234)
    clf = RandomForestClassifier(max_depth=2)
    clf.fit(X, y)

    clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234))
    clf2.fit(X, y)

    print(all(clf.predict(X) == clf2.predict(X)))

结果：

True
True
True
True
True

票数 10

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47433920

复制

相似问题

问Python学习RandomForestClassifier不可复制的结果
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python学习RandomForestClassifier不可复制的结果EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python学习RandomForestClassifier不可复制的结果
EN