我有一个具有一致名称的列的数据集,我想按它们的组取列的平均值,例如,
library(dplyr)
library(purrr)
library(glue)
df <- tibble(`1_x_blind` = 1:3,
`1_y_blind` = 7:9,
`2_x_blind` = 4:6,
`2_y_blind` = 5:7)
df %>%
mutate(`1_overall_test` = rowMeans(select(., matches(glue("^1_.*_blind$")))))
#> # A tibble: 3 x 5
#> `1_x_blind` `1_y_blind` `2_x_blind` `2_y_blind` `1_overall_test`
#> <int> <int> <int> <int> <dbl>
#> 1 1 7 4 5 4
#> 2 2 8 5 6 5
#> 3 3 9 6 7 6这种方法效果很好。对于我来说,下一步应该是缩放它,这样我就可以完成整个系列的列,例如
df %>%
mutate(overall_blind = map(1:2, ~rowMeans(select(., matches(glue("^{.x}_.*_blind$"))))))
#> Error: Problem with `mutate()` input `overall_blind`.
#> x no applicable method for 'select' applied to an object of class "c('integer', 'numeric')"
#> ℹ Input `overall_blind` is `map(1:2, ~rowMeans(select(., matches(glue("^{.x}_.*_blind$")))))`.我认为这里的问题是select混淆了.操作符。是否有可能以这种方式对一系列列名进行map?理想情况下,我希望列名遵循{.x}_overall模式,如上面的示例所示。
发布于 2020-11-08 17:34:17
更新这里有一种更干净的方式,不需要rename或bind_cols
map_dfc(1:2,
function(x) df %>%
select(matches(glue("^{x}_.*_blind$"))) %>%
mutate("{x}_overall_blind" := rowMeans(.))
)
# A tibble: 3 x 6
`1_x_blind` `1_y_blind` `1_overall_blind` `2_x_blind` `2_y_blind` `2_overall_blind`
<int> <int> <dbl> <int> <int> <dbl>
1 1 7 4 4 5 4.5
2 2 8 5 5 6 5.5
3 3 9 6 6 7 6.5以前的
下面是一种map方法。
挑战是根据现有列的不同组突变两个新列。最简单的方法就是在自己的map_dfc()中这样做,然后将其绑定到现有的df。
df %>%
bind_cols(
map_dfc(1:2, ~rowMeans(df %>% select(matches(glue("^{.x}_.*_blind$"))))) %>%
rename_with(~paste0(str_replace(., "\\...", ""), "_overall_blind"))
)
# A tibble: 3 x 6
`1_x_blind` `1_y_blind` `2_x_blind` `2_y_blind` `1_overall_blind` `2_overall_blind`
<int> <int> <int> <int> <dbl> <dbl>
1 1 7 4 5 4 4.5
2 2 8 5 6 5 5.5
3 3 9 6 7 6 6.5下面是一种使用枢轴获取行列组平均值的方法,这避免了regex和mutate/map操作:
df %>%
mutate(row = row_number()) %>%
pivot_longer(-row) %>%
separate(name, c("grp"), sep = "_", extra = "drop") %>%
group_by(row, grp) %>%
summarise(overall_blind = mean(value)) %>%
ungroup() %>%
pivot_wider(id_cols = row, names_from = grp, values_from = overall_blind,
names_glue = "{grp}_{.value}") %>%
bind_cols(df)
# A tibble: 3 x 6
`1_overall_blind` `2_overall_blind` `1_x_blind` `1_y_blind` `2_x_blind` `2_y_blind`
<dbl> <dbl> <int> <int> <int> <int>
1 4 4.5 1 7 4 5
2 5 5.5 2 8 5 6
3 6 6.5 3 9 6 7发布于 2020-11-08 21:36:56
我们可以使用split.default根据列名模式将数据拆分为数据集的list,然后获取rowMeans并与原始数据绑定。
library(dplyr)
library(purrr)
library(stringr)
df %>%
split.default(readr::parse_number(names(.))) %>%
map_dfc(rowMeans) %>%
set_names(str_c(names(.), "_overall_blind")) %>%
bind_cols(df, .)
# A tibble: 3 x 6
# `1_x_blind` `1_y_blind` `2_x_blind` `2_y_blind` `1_overall_blind` `2_overall_blind`
# <int> <int> <int> <int> <dbl> <dbl>
#1 1 7 4 5 4 4.5
#2 2 8 5 6 5 5.5
#3 3 9 6 7 6 6.5https://stackoverflow.com/questions/64740527
复制相似问题