我有一个dataframe (下面的例子),它在多天内对问卷进行了回复。
> df %>%
mutate (Sigma_Bucket_Q1 = if_else(Sigma_Q1 >= Median_Sigma_Q1,
"Above Median Volatility", "Below Median Volatility"))
# A tibble: 19 x 12
UserId Days_From_First_Use Q1 Q2 Q3 Sigma_Q1 Sigma_Q2 Sigma_Q3 Median_Sigma_Q1 Median_Sigma_Q2 Median_Sigma_Q3 Sigma_Bucket_Q1
<fct> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 A 0 3 2 1 1.10 0.837 0.548 1.45 1.59 1.53 Below Median Volatility
2 A 1 1 0 0 1.10 0.837 0.548 1.45 1.59 1.53 Below Median Volatility
3 A 2 1 1 0 1.10 0.837 0.548 1.45 1.59 1.53 Below Median Volatility
4 A 3 0 2 0 1.10 0.837 0.548 1.45 1.59 1.53 Below Median Volatility
5 A 4 1 1 1 1.10 0.837 0.548 1.45 1.59 1.53 Below Median Volatility
6 B 0 4 8 2 1.26 2.5 2.06 1.45 1.59 1.53 Below Median Volatility
7 B 2 2 2 1 1.26 2.5 2.06 1.45 1.59 1.53 Below Median Volatility
8 B 4 5 6 5 1.26 2.5 2.06 1.45 1.59 1.53 Below Median Volatility
9 B 5 4 5 5 1.26 2.5 2.06 1.45 1.59 1.53 Below Median Volatility
10 C 0 5 7 2 1.64 1.87 1 1.45 1.59 1.53 Above Median Volatility
11 C 1 2 2 2 1.64 1.87 1 1.45 1.59 1.53 Above Median Volatility
12 C 2 5 5 4 1.64 1.87 1 1.45 1.59 1.53 Above Median Volatility
13 C 3 6 5 3 1.64 1.87 1 1.45 1.59 1.53 Above Median Volatility
14 C 4 6 6 4 1.64 1.87 1 1.45 1.59 1.53 Above Median Volatility
15 D 0 5 3 5 2.35 1.30 2.30 1.45 1.59 1.53 Above Median Volatility
16 D 1 5 3 4 2.35 1.30 2.30 1.45 1.59 1.53 Above Median Volatility
17 D 2 4 2 6 2.35 1.30 2.30 1.45 1.59 1.53 Above Median Volatility
18 D 3 0 0 1 2.35 1.30 2.30 1.45 1.59 1.53 Above Median Volatility
19 D 4 1 1 1 2.35 1.30 2.30 1.45 1.59 1.53 Above Median Volatility列Q1、Q2和Q3具有回答,而Sigma_Q1、Sigma_Q2和Sigma Q3列具有每个受试者对每个问题的回答的时间序列标准差。Median_Sigma_1、Median_Sigma_2和Median_Sigma_3对Q1、Q2和Q3的反应在受试者中具有中位数标准差。我想根据是否有Sigma_Q1 > Median_Sigma_Q1等将每个主题划分为高于中位数或低于中位数的波动率主题。我用来生成Sigma_Bucket_Q1的表达式运行得很好;它就在tibble之前可见。
但是,当我尝试将其推广到同时生成所有Sigma_Buckets时(我的实际问题有21个这样的名称),我遇到了一个问题。我试过了:
df %>%
mutate (across(all_of(paste0("Sigma_Bucket_", c("Q1", "Q2", "Q3")) = if_else(paste0("Sigma_", {.col}) >= paste0("Median_Sigma_", {.col}),
"Above Median Volatility", "Below Median Volatility")))我收到一条神秘的错误消息,并且无法确定需要修复的内容:
> df %>%
+ mutate (across(all_of(paste0("Sigma_Bucket_", c("Q1", "Q2", "Q3")) = if_else(paste0("Sigma_", {.col}) >= paste0("Median_Sigma_", {.col}),
Error: unexpected '=' in:
"df %>%
mutate (across(all_of(paste0("Sigma_Bucket_", c("Q1", "Q2", "Q3")) ="
> "Above Median Volatility", "Below Median Volatility")))
Error: unexpected ',' in " "Above Median Volatility","我如何修改我的语句来完成所有3列(在实际问题中都是21列),而不为每个问题写一行?
浏览StackOverflow上的各种答案表明,mutate_if可能是解决方案的基础,但我不知道如何在这种特定的设置中使用它。
非常感谢您的帮助
托马斯·菲利普斯
发布于 2020-12-21 13:04:11
这是一个使用map的解决方案
map2_df(
df %>% select(starts_with("Sigma_Q")),
df %>% select(starts_with("Median_Sigma_Q")),
~if_else(.x >= .y, "Above Median Volatility", "Below Median Volatility")) %>%
rename_with(~str_replace(.x, "Sigma", "Sigma_Bucket"))输出:
# A tibble: 19 x 3
Sigma_Bucket_Q1 Sigma_Bucket_Q2 Sigma_Bucket_Q3
<chr> <chr> <chr>
1 Below Median Volatility Below Median Volatility Below Median Volatility
2 Below Median Volatility Below Median Volatility Below Median Volatility
3 Below Median Volatility Below Median Volatility Below Median Volatility
4 Below Median Volatility Below Median Volatility Below Median Volatility
5 Below Median Volatility Below Median Volatility Below Median Volatility
6 Below Median Volatility Above Median Volatility Above Median Volatility
7 Below Median Volatility Above Median Volatility Above Median Volatility
8 Below Median Volatility Above Median Volatility Above Median Volatility
9 Below Median Volatility Above Median Volatility Above Median Volatility
10 Above Median Volatility Above Median Volatility Below Median Volatility
11 Above Median Volatility Above Median Volatility Below Median Volatility
12 Above Median Volatility Above Median Volatility Below Median Volatility
13 Above Median Volatility Above Median Volatility Below Median Volatility
14 Above Median Volatility Above Median Volatility Below Median Volatility
15 Above Median Volatility Below Median Volatility Above Median Volatility
16 Above Median Volatility Below Median Volatility Above Median Volatility
17 Above Median Volatility Below Median Volatility Above Median Volatility
18 Above Median Volatility Below Median Volatility Above Median Volatility
19 Above Median Volatility Below Median Volatility Above Median Volatility发布于 2020-12-21 11:36:47
across没有访问列名的权限,它们只传递列值。您可以尝试这种基于R的矢量化方法,而不需要任何循环。
col1 <- grep('^Sigma_Q\\d$', names(df), value = TRUE)
col2 <- grep('^Median_Sigma_Q\\d$', names(df), value = TRUE)
df[paste0(col1, '_Bucket')] <- c("Below Median Volatility", "Above Median Volatility")[(df[col1] >= df[col2]) + 1]https://stackoverflow.com/questions/65387183
复制相似问题