我知道随机森林模型可以用于分类和回归情况。
是否有更具体的标准来确定随机森林模型在估计值时比一般回归(线性、拉索等)或Logistic回归进行分类时表现得更好?
发布于 2019-06-30 13:20:11
在前一个答案中添加一些额外的一般性要点:
希望这能有所帮助!
发布于 2020-02-26 15:59:17
请考虑以下几点:
1) Random forest algorithm can be used for both classifications and regression task.
2) It typically provides very high accuracy.
3) Random forest classifier will handle the missing values and maintain the accuracy of a large proportion of data.
4) If there are more trees, it usually won’t allow overfitting trees in the model.
5) It has the power to handle a large data set with high dimensionality最终,你所选择的工作都取决于你自己。你肯定希望algo的预测能力相当高(超过90%)。有时其他的徽标击败射频阿尔戈,但我发现,经常射频是相当好的!通常,我从射频开始,如果我看到了不错的表现,我就完蛋了。我相信,至少在80%的时间里,我就完蛋了。如果你没有得到好的结果从射频阿尔戈,测试其他一些。
这给了几个不同的标志很好的比较。
import pandas
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
# load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
# prepare configuration for cross validation test harness
seed = 7
# prepare models
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC()))
# evaluate each model in turn
results = []
names = []
scoring = 'accuracy'
for name, model in models:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
cv_results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
results.append(cv_results)
names.append(name)
msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
print(msg)
# boxplot algorithm comparison
fig = plt.figure()
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
plt.boxplot(results)
ax.set_xticklabels(names)
plt.show()发布于 2019-06-29 23:15:13
这只是一个一般性的答案,但如果有帮助的话:
直觉上,我看到决策树(包括随机森林)是监督学习的“瑞士军刀”:高效、通用、易用。
https://datascience.stackexchange.com/questions/54751
复制相似问题