首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R dataframe使用跨/ all_of / mutate_if从现有列创建多个新列

R dataframe使用跨/ all_of / mutate_if从现有列创建多个新列
EN

Stack Overflow用户
提问于 2020-12-21 11:21:44
回答 2查看 37关注 0票数 0

我有一个dataframe (下面的例子),它在多天内对问卷进行了回复。

代码语言:javascript
复制
        > df %>% 
            mutate (Sigma_Bucket_Q1  = if_else(Sigma_Q1 >= Median_Sigma_Q1, 
                    "Above Median Volatility", "Below Median Volatility"))
    # A tibble: 19 x 12
       UserId Days_From_First_Use    Q1    Q2    Q3 Sigma_Q1 Sigma_Q2 Sigma_Q3 Median_Sigma_Q1 Median_Sigma_Q2 Median_Sigma_Q3 Sigma_Bucket_Q1        
       <fct>                <int> <int> <int> <int>    <dbl>    <dbl>    <dbl>           <dbl>           <dbl>           <dbl> <chr>                  
     1 A                        0     3     2     1     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     2 A                        1     1     0     0     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     3 A                        2     1     1     0     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     4 A                        3     0     2     0     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     5 A                        4     1     1     1     1.10    0.837    0.548            1.45            1.59            1.53 Below Median Volatility
     6 B                        0     4     8     2     1.26    2.5      2.06             1.45            1.59            1.53 Below Median Volatility
     7 B                        2     2     2     1     1.26    2.5      2.06             1.45            1.59            1.53 Below Median Volatility
     8 B                        4     5     6     5     1.26    2.5      2.06             1.45            1.59            1.53 Below Median Volatility
     9 B                        5     4     5     5     1.26    2.5      2.06             1.45            1.59            1.53 Below Median Volatility
    10 C                        0     5     7     2     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    11 C                        1     2     2     2     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    12 C                        2     5     5     4     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    13 C                        3     6     5     3     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    14 C                        4     6     6     4     1.64    1.87     1                1.45            1.59            1.53 Above Median Volatility
    15 D                        0     5     3     5     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility
    16 D                        1     5     3     4     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility
    17 D                        2     4     2     6     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility
    18 D                        3     0     0     1     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility
    19 D                        4     1     1     1     2.35    1.30     2.30             1.45            1.59            1.53 Above Median Volatility

Q1Q2Q3具有回答,而Sigma_Q1Sigma_Q2Sigma Q3列具有每个受试者对每个问题的回答的时间序列标准差。Median_Sigma_1Median_Sigma_2Median_Sigma_3Q1Q2Q3的反应在受试者中具有中位数标准差。我想根据是否有Sigma_Q1 > Median_Sigma_Q1等将每个主题划分为高于中位数或低于中位数的波动率主题。我用来生成Sigma_Bucket_Q1的表达式运行得很好;它就在tibble之前可见。

但是,当我尝试将其推广到同时生成所有Sigma_Buckets时(我的实际问题有21个这样的名称),我遇到了一个问题。我试过了:

代码语言:javascript
复制
        df %>% 
  mutate (across(all_of(paste0("Sigma_Bucket_", c("Q1", "Q2", "Q3")) = if_else(paste0("Sigma_", {.col}) >= paste0("Median_Sigma_",  {.col}), 
          "Above Median Volatility", "Below Median Volatility")))

我收到一条神秘的错误消息,并且无法确定需要修复的内容:

代码语言:javascript
复制
> df %>% 
+   mutate (across(all_of(paste0("Sigma_Bucket_", c("Q1", "Q2", "Q3")) = if_else(paste0("Sigma_", {.col}) >= paste0("Median_Sigma_",  {.col}), 
Error: unexpected '=' in:
"df %>% 
  mutate (across(all_of(paste0("Sigma_Bucket_", c("Q1", "Q2", "Q3")) ="
>           "Above Median Volatility", "Below Median Volatility")))
Error: unexpected ',' in "          "Above Median Volatility","

我如何修改我的语句来完成所有3列(在实际问题中都是21列),而不为每个问题写一行?

浏览StackOverflow上的各种答案表明,mutate_if可能是解决方案的基础,但我不知道如何在这种特定的设置中使用它。

非常感谢您的帮助

托马斯·菲利普斯

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-12-21 13:04:11

这是一个使用map的解决方案

代码语言:javascript
复制
map2_df(
    df %>% select(starts_with("Sigma_Q")), 
    df %>% select(starts_with("Median_Sigma_Q")),
    ~if_else(.x >= .y, "Above Median Volatility", "Below Median Volatility")) %>%
  rename_with(~str_replace(.x, "Sigma", "Sigma_Bucket"))

输出:

代码语言:javascript
复制
# A tibble: 19 x 3
   Sigma_Bucket_Q1         Sigma_Bucket_Q2         Sigma_Bucket_Q3        
   <chr>                   <chr>                   <chr>                  
 1 Below Median Volatility Below Median Volatility Below Median Volatility
 2 Below Median Volatility Below Median Volatility Below Median Volatility
 3 Below Median Volatility Below Median Volatility Below Median Volatility
 4 Below Median Volatility Below Median Volatility Below Median Volatility
 5 Below Median Volatility Below Median Volatility Below Median Volatility
 6 Below Median Volatility Above Median Volatility Above Median Volatility
 7 Below Median Volatility Above Median Volatility Above Median Volatility
 8 Below Median Volatility Above Median Volatility Above Median Volatility
 9 Below Median Volatility Above Median Volatility Above Median Volatility
10 Above Median Volatility Above Median Volatility Below Median Volatility
11 Above Median Volatility Above Median Volatility Below Median Volatility
12 Above Median Volatility Above Median Volatility Below Median Volatility
13 Above Median Volatility Above Median Volatility Below Median Volatility
14 Above Median Volatility Above Median Volatility Below Median Volatility
15 Above Median Volatility Below Median Volatility Above Median Volatility
16 Above Median Volatility Below Median Volatility Above Median Volatility
17 Above Median Volatility Below Median Volatility Above Median Volatility
18 Above Median Volatility Below Median Volatility Above Median Volatility
19 Above Median Volatility Below Median Volatility Above Median Volatility
票数 1
EN

Stack Overflow用户

发布于 2020-12-21 11:36:47

across没有访问列名的权限,它们只传递列值。您可以尝试这种基于R的矢量化方法,而不需要任何循环。

代码语言:javascript
复制
col1 <- grep('^Sigma_Q\\d$', names(df), value = TRUE)
col2 <- grep('^Median_Sigma_Q\\d$', names(df), value = TRUE)

df[paste0(col1, '_Bucket')] <- c("Below Median Volatility", "Above Median Volatility")[(df[col1] >= df[col2]) + 1]
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65387183

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档