我想在小组内按性别进行t检验。我有两个组变量(group_1和group_2)和多个结果变量(var1和var2 --尽管在我的数据集中有很多变量)。
#Packages
library(dplyr)
library(reshape2)
library(rstatix)
##Dataset
group_1 <-c(rep("Group X", 40), rep("Group Y", 40),
rep("Group Z", 60), rep("Group Y", 20),
rep("Group Z", 50), rep("Group Y", 10))
group_2 <- c(rep("A", 100), rep("B", 20), rep("C", 50), rep("A", 20), rep("B", 30))
var1 <- rnorm(n=220, mean=0, sd=1)
var2 <- rnorm(n = 220, mean = 1, sd=1.3)
gender <- c(rep("M", 30), rep("F", 30), rep("M", 40) , rep("F", 50), rep("M", 20),
rep("F", 20), rep("M", 30))
data <- as.data.frame(cbind(group_1, group_2, var1, var2, gender))
##Groupings
table(data$group_1, data$group_2, data$gender)
#Long format
g_long <- gather(data, variable, value, var1:var2)
g_long$value <- as.numeric(g_long$value)
#T-tests for each variable within groups
g_test <- g_long %>%
group_by(variable, group_1, group_2) %>%
t_test(value ~ gender, p.adjust.method = "holm", paired=FALSE)上面的代码给出了以下错误:
Error: Problem with `mutate()` input `data`.
x not enough 'y' observations
i Input `data` is `map(.data$data, .f, ...)`.此代码只适用于一个组,或者如果我删除正确的数据:
#this works
g_test <- g_long %>%
group_by(variable, group_1) %>%
t_test(value ~ gender, p.adjust.method = "holm", paired=FALSE)
#Manually remove category where I cannot calculate gender diff - this works
g_long1 <- g_long[!(g_long$group_1 == "Group Y" & g_long$group_2 == "B"),]
g_test <- g_long1 %>%
group_by(variable, group_1, group_2) %>%
t_test(value ~ gender, p.adjust.method = "holm", paired=FALSE)Y组和B组中没有女性,所以如果我手动删除它们,代码就能工作。我尝试了下面这样的方法来自动检测和删除这些类别,但是这并没有帮助,因为如果每个类别中都没有男性或女性,它就无法删除数据。
g_long<- g_long %>%
group_by(group_1, group_2, variable, gender) %>%
filter(n() >= 5) 如何自动删除无法运行t测试的类别?在我的数据集中,每个组有三个以上的类别,因此手动选择每个组将很困难。
发布于 2021-08-06 17:52:33
我们可以使用nest_by创建一个带有transmute的list列,使用一个逻辑条件来检查每个组的“性别”中不同(n_distinct)元素的数量。
library(dplyr)
library(rstatix)
g_long %>%
nest_by(variable, group_1, group_2) %>%
transmute(out = list(if(n_distinct(data$gender) > 1) data %>%
t_test(value ~ gender, p.adjust.method = "holm",
paired=FALSE) else NA)) %>%
ungroup-ouptut
# A tibble: 14 x 4
variable group_1 group_2 out
<chr> <chr> <chr> <list>
1 var1 Group X A <rstatix_test [1 × 8]>
2 var1 Group Y A <rstatix_test [1 × 8]>
3 var1 Group Y B <lgl [1]>
4 var1 Group Y C <rstatix_test [1 × 8]>
5 var1 Group Z A <rstatix_test [1 × 8]>
6 var1 Group Z B <rstatix_test [1 × 8]>
7 var1 Group Z C <rstatix_test [1 × 8]>
8 var2 Group X A <rstatix_test [1 × 8]>
9 var2 Group Y A <rstatix_test [1 × 8]>
10 var2 Group Y B <lgl [1]>
11 var2 Group Y C <rstatix_test [1 × 8]>
12 var2 Group Z A <rstatix_test [1 × 8]>
13 var2 Group Z B <rstatix_test [1 × 8]>
14 var2 Group Z C <rstatix_test [1 × 8]>要提取list元素,请使用unnest
library(tidyr)
> g_long %>%
+ nest_by(variable, group_1, group_2) %>%
+ transmute(out = list(if(n_distinct(data$gender) > 1) data %>%
+ t_test(value ~ gender, p.adjust.method = "holm",
+ paired=FALSE) else NA)) %>%
+ ungroup %>%
+ unnest(out)
# A tibble: 14 x 12
variable group_1 group_2 .y. group1 group2 n1 n2 statistic df p out
<chr> <chr> <chr> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <lgl>
1 var1 Group X A value F M 10 30 -0.350 30.7 0.729 NA
2 var1 Group Y A value F M 20 20 -0.0286 37.7 0.977 NA
3 var1 Group Y B <NA> <NA> <NA> NA NA NA NA NA NA
4 var1 Group Y C value F M 10 10 0.221 17.0 0.828 NA
5 var1 Group Z A value F M 20 20 -0.0811 38.0 0.936 NA
6 var1 Group Z B value F M 20 20 -1.03 34.7 0.309 NA
7 var1 Group Z C value F M 20 10 -1.17 20.3 0.256 NA
8 var2 Group X A value F M 10 30 -0.601 13.0 0.558 NA
9 var2 Group Y A value F M 20 20 -0.824 36.8 0.415 NA
10 var2 Group Y B <NA> <NA> <NA> NA NA NA NA NA NA
11 var2 Group Y C value F M 10 10 -0.00521 17.6 0.996 NA
12 var2 Group Z A value F M 20 20 -0.956 38.0 0.345 NA
13 var2 Group Z B value F M 20 20 0.593 31.2 0.557 NA
14 var2 Group Z C value F M 20 10 -1.57 17.0 0.136 NA 关于OP文章中的错误,它与group_1 'Y‘和'group_2’'B‘中唯一的“性别”元素的数量有关。
https://stackoverflow.com/questions/68685664
复制相似问题