我正在试图找到一种优雅的方法来进行t检验,比较6组数据的平均值,最好使用dplyr/tidyverse。我的数据看起来是这样的:
分组变量数值变量
A 5.6
A 2.3
A 4.8
B 7.3
B 6.9
B 5.8
C 1.4
C 6.4
我知道我可以这样做:
df_a <- df %>% filter(grouping_variable == 'A')
df_b <- df %>% filter(grouping_variable == 'B')
a_b <- t.test(df_a,df_b)$p.value然后对每个变量组合重复这一步。只有6个分组变量,所以上面的问题不是不可能的,但是必须有一种更简单的方法来实现:
df %>% group_by(grouping_variable)%>%
t.test(of each on each)也许是和整洁有关的东西?
我的最终结果是得到一个tibble,如下所示
B、C、D、E、F
A .34 .4 .235 ...
B .03 .34 .454...
发布于 2019-01-30 05:19:17
这可以使用purrr中的cross和map函数干净利落地完成。
示例数据:
df <- tibble(group_var = rep(c("A", "B", "C"), times = 5),
num_var = rnorm(15))
df
# A tibble: 15 x 2
group_var num_var
<chr> <dbl>
1 A 1.66
2 B -0.694
3 C -0.680
4 A 1.96
5 B -0.380
6 C -0.941
7 A 1.02
8 B 0.0476
9 C 0.770
10 A 1.41
11 B 0.137
12 C -0.816
13 A -0.478
14 B 0.374
15 C -0.619 使用cross创建包含所有变量组合的数据帧:
test_results <- cross_df(list(var1 = c("A", "B", "C"), var2 = c("A", "B", "C")))添加包含测试结果的列:
test_results <- test_results %>%
mutate(ttest = map2_dbl(var1, var2,
~ t.test(df %>% filter(group_var == .x) %>% .$num_var,
df %>% filter(group_var == .y) %>% .$num_var)$p.value))
test_results %>%
spread(var2, ttest)
var1 A B C
<chr> <dbl> <dbl> <dbl>
1 A 1 0.0436 0.0197
2 B 0.0436 1 0.367
3 C 0.0197 0.367 1 如果将t.test包装在一个函数中,这会更容易阅读:
ttester <- function(v1, v2) {
t <- t.test(df %>% filter(group_var == v1) %>% .$num_var,
df %>% filter(group_var == v2) %>% .$num_var)
t$p.value
}
cross_df(list(var1 = c("A", "B", "C"), var2 = c("A", "B", "C"))) %>%
mutate(ttest = map2_dbl(var1, var2, ~ttester(.x, .y))) %>%
spread(var2, ttest)
var1 A B C
<chr> <dbl> <dbl> <dbl>
1 A 1 0.0436 0.0197
2 B 0.0436 1 0.367
3 C 0.0197 0.367 1 发布于 2019-01-30 03:48:25
检查此解决方案:
library(tidyverse)
library(magrittr)
df %$%
crossing(
gr1 = grouping_variable %>% unique(),
gr2 = grouping_variable %>% unique()
) %>%
filter(gr1 != gr2) %>%
left_join(
df %>%
group_by(grouping_variable) %>%
nest() %>%
rename_all(~c('gr1', 'data1'))
) %>%
left_join(
df %>%
group_by(grouping_variable) %>%
nest() %>%
rename_all(~c('gr2', 'data2'))
) %>%
mutate(p_val = map2_dbl(
data1, data2,
~t.test(
.x$numerical_variable,
.y$numerical_variable
)$p.value
)
)发布于 2019-01-30 04:00:55
首先是一些数据:
df <-
data_frame(
Group = rep(LETTERS[1:8], each = 10)
, Value = rnorm(80)
)从这里,我拉出了独特的组级别:
my_groups <-
sort(unique(df$Group))然后,我喜欢使用lapply遍历感兴趣的指标。基本上,对于每一对组,我都运行t测试,并将感兴趣的指标(组均值、差异、p值)记录为data_frame,然后将行绑定在一起。请注意,我使用了magrittr中的%$%运算符作为从t.test结果中获取指标的快捷方式。
t_tests_out <-
lapply(my_groups, function(group_a){
lapply(my_groups, function(group_b){
# Skip case where a and b are the same
if(group_a == group_b){
return(NULL)
}
df %>%
filter(Group %in% c(group_a, group_b)) %>%
mutate(temp_group = ifelse(Group == group_a, "A", "B")) %>%
t.test(Value ~ temp_group, data = .) %$%
data_frame(
group_a = group_a
, group_b = group_b
, mean_a = estimate[1]
, mean_b = estimate[2]
, diff = mean_a - mean_b
, pval = p.value
)
}) %>%
bind_rows()
}) %>%
bind_rows()这看起来像这样:
# A tibble: 56 x 6
group_a group_b mean_a mean_b diff pval
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 A B -0.275 0.0851 -0.360 0.384
2 A C -0.275 -0.651 0.376 0.406
3 A D -0.275 -0.440 0.165 0.737
4 A E -0.275 0.336 -0.611 0.245
5 A F -0.275 -0.277 0.00233 0.996
6 A G -0.275 -0.115 -0.160 0.754
7 A H -0.275 -0.406 0.131 0.821
8 B A 0.0851 -0.275 0.360 0.384
9 B C 0.0851 -0.651 0.736 0.0748
10 B D 0.0851 -0.440 0.525 0.245
# ... with 46 more rows虽然长格式对于某些事情非常有用,比如绘制结果:
t_tests_out %>%
ggplot(aes(x = group_a
, y = group_b
, fill = pval)) +
geom_tile(col = "white") +
scale_fill_distiller(palette = "YlOrRd"
, limits = c(0,1)) +
theme_minimal()

您还可以将结果展开以生成您要查找的表:
t_tests_out %>%
select(group_a, group_b, pval) %>%
spread(group_b, pval)返回
# A tibble: 8 x 9
group_a A B C D E F G H
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A NA 0.384 0.406 0.737 0.245 0.996 0.754 0.821
2 B 0.384 NA 0.0748 0.245 0.595 0.439 0.668 0.371
3 C 0.406 0.0748 NA 0.659 0.0632 0.456 0.291 0.668
4 D 0.737 0.245 0.659 NA 0.163 0.762 0.547 0.955
5 E 0.245 0.595 0.0632 0.163 NA 0.280 0.425 0.243
6 F 0.996 0.439 0.456 0.762 0.280 NA 0.770 0.835
7 G 0.754 0.668 0.291 0.547 0.425 0.770 NA 0.640
8 H 0.821 0.371 0.668 0.955 0.243 0.835 0.640 NA https://stackoverflow.com/questions/54428143
复制相似问题