文章/答案/技术大牛

发布

社区首页 >问答首页 >cforest party不平衡类

问cforest party不平衡类
EN

Stack Overflow用户

提问于 2014-10-16 07:14:47

回答 1查看 618关注 0票数 4

我想用party库中的cforest函数来衡量特征的重要性。

我的输出变量在类0中有2000个样本，在类1中有100个样本。

我认为避免由于类不平衡而产生偏差的一个好方法是使用子样本来训练森林中的每一棵树，使得类1的元素的数量与类0的元素的数量相同。

有没有办法做到这一点？我正在考虑像n_samples = c(20, 20)这样的选项

编辑:代码示例

   > iris.cf <- cforest(Species ~ ., data = iris, 
    +                    control = cforest_unbiased(mtry = 2)) #<--- Here I would like to train the forest using a balanced subsample of the data

 > varimp(object = iris.cf)
    Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
     0.048981818  0.002254545  0.305818182  0.271163636 
    >

编辑:也许我的问题不够清楚。随机森林是一组决策树。通常，仅使用数据的随机子样本来构建决策树。我希望使用的子样本在类1和类0中具有相同数量的元素。

编辑:我正在寻找的函数在randomForest包中肯定是可用的

sampsize    
Size(s) of sample to draw. For classification, if sampsize is a vector of the length the number of strata, then sampling is stratified by strata, and the elements of sampsize indicate the numbers to be drawn from the strata.

我的派对套餐也需要一样的。有没有办法弄到它？

random-forest

party

回答 1

Stack Overflow用户

发布于 2014-10-16 08:07:28

我将假设你知道你想要完成什么，但不知道足够的R来完成。

不确定该函数是否将数据平衡作为参数提供，但您可以手动完成。下面是我快速拼凑起来的代码。可能存在更优雅的解决方案。

# just in case
myData <- iris
# replicate everything *10* times. Replicate is just a "loop 10 times".
replicate(10,
    {   
        # split dataset by class and add separate classes to list
        splitList <- split(myData, myData$Species)
        # sample *20* random rows from each matrix in a list
        sampledList <- lapply(splitList, function(dat) { dat[sample(20),] })
        # combine sampled rows to a data.frame
        sampledData <- do.call(rbind, sampledList)

        # your code below
        res.cf <- cforest(Species ~ ., data = sampledData,
                          control = cforest_unbiased(mtry = 2)
                          )
        varimp(object = res.cf)
    }
)

希望你能接手这事。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/26393675

复制

相似问题

问cforest party不平衡类
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问cforest party不平衡类EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问cforest party不平衡类
EN