问非平衡样本的重复随机抽样与峰度
EN

Stack Overflow用户

提问于 2021-01-29 07:40:46

回答 1查看 27关注 0票数 1

我有一个不平衡的数据集，来自自由和保守背景的人在一个问题上给出了评级(1-7)。想看看这个问题有多两极分化。

样本严重偏向自由派(占样本的70%)。如何使用R进行重复采样以创建平衡样本(50-50)并计算峰度？

例如，我总共有50名保守派人士。我如何重复地从150名自由主义者中随机抽取50名？

下面是一个示例数据帧：

  political_ort   rating  
    liberal         1 
    liberal         6 
    conservative    5   
    conservative    3   
    liberal         7  
    liberal         3 
    liberal         1

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-01-29 10:15:33

你所描述的被称为“欠采样”。以下是使用tidyverse函数的一种方法：

# Load library
library(tidyverse)

# Create some 'test' (fake) data
sample_df <- data_frame(id_number = (1:100),
                        political_ort = c(rep("liberal", 70),
                                          rep("conservative", 30)),
                        ratings = sample(1:7, size = 100, replace = TRUE))

# Take the fake data
undersampled_df <- sample_df %>% 
# Group the data by category (liberal / conservative) to treat them separately
  group_by(political_ort) %>% 
# And randomly sample 30 rows from each category (liberal / conservative)
  sample_n(size = 30, replace = FALSE) %>%
# Because there are only 30 conservatives in total they are all included
# Finally, ungroup the data so it goes back to a 'vanilla' dataframe/tibble
  ungroup()
# You can see the id_numbers aren't in order anymore indicating the sampling was random

还有一个ROSE包，它有一个函数("ovun.sample")可以为您做这件事：https://www.rdocumentation.org/packages/ROSE/versions/0.0-3/topics/ovun.sample

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65946805

复制

相似问题

问非平衡样本的重复随机抽样与峰度
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问非平衡样本的重复随机抽样与峰度EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问非平衡样本的重复随机抽样与峰度
EN