这是我的数据框架:
data<-
ID Group Modules
1 Male Physics
1 Male Chemistry
2 Female Biology
2 Female Physics
2 Female Chemistry
3 Male Physics
3 Male Biology
3 Male Chemistry
4 Male Physics
4 Male Biology
4 Male Chemistry
5 Male Physics
5 Male Biology
5 Male Chemistry
6 Male Physics
6 Male Biology
6 Male Chemistry
7 Female Physics
7 Female Biology
8 Female Chemistry
8 Male Physics
8 Male Biology
9 Male Chemistry
9 Male Physics
10 Male Biology
10 Male Chemistry
10 Male Physics
11 Male Biology
11 Male Chemistry
11 Male Physics
12 Female Biology
12 Female Chemistry上述数据中男性(n=9)多于女性(n=3)。我想随机选择3个没有回复的男性,所以我最终会有3个男性和3个女性。我还想保留重复的I,因此我得到的结果将是:
newdata<-
ID Group Modules
1 Male Physics
1 Male Chemistry
2 Female Biology
2 Female Physics
2 Female Chemistry
3 Male Physics
3 Male Biology
3 Male Chemistry
7 Female Physics
7 Female Biology
7 Female Chemistry
12 Female Physics
12 Female Biology
6 Male Physics
6 Male Biology
6 Male Chemistry下面是我的代码:
samples_per_group<-6
new data<-data%>% group_by(Group)%>%slice(sample(n(),min(sampples_per_group, n())))%>%ungroup()当我尝试运行它时,它选择6个样本大小(每组3个),但是它只从每个参与者中获取一行,而不是返回该参与者的所有行。基本上,我希望在每个组上选择3个id,而不考虑该id重复的次数。欢迎任何帮助。谢谢
发布于 2021-05-11 04:31:46
如果要对ID进行采样,则需要获取ID并对其进行采样:
groups = data %>%
distinct(ID, Group) %>%
group_by(Group) %>%
summarize(group_size = n())
smallest_group = min(groups$group_size)
groups %>%
group_by(Group) %>%
sample_n(size = smallest_group) %>%
ungroup() %>%
left_join(data)像上面这样的东西应该可以工作。在dplyr链中跨组获取单个数字是困难的--这是可行的--但我认为更清晰的方法是打破管道并提取数字。我们按组采样3个(或任意多个)ID,然后连接回主数据,以获得与这些ID对应的所有行。
https://stackoverflow.com/questions/67475258
复制相似问题