文章/答案/技术大牛

发布

社区首页 >问答首页 >dplyr编程:取消引号-剪接会导致带完全()和嵌套()的范围过大的错误。

问dplyr编程:取消引号-剪接会导致带完全()和嵌套()的范围过大的错误。
EN

Stack Overflow用户

提问于 2017-11-09 21:22:05

回答 1查看 303关注 0票数 3

因此，我开始深入到dplyr编程的奇妙世界中。我正在尝试编写一个函数，它接受一个data.frame、一个目标列和任意数量的分组列(使用所有列的裸名)。然后，该函数将根据目标列存储数据，并计算每个回收站中的条目数。我希望为原来的data.frame()中的每个分组变量的组合保持一个单独的bin大小，所以我使用了complete()和nesting()函数来完成这个任务。下面是我试图做的事情和我遇到的错误的一个例子：

library(dplyr)
library(tidyr)

#Prepare test data
set.seed(42)
test_data =
    data.frame(Gene_ID = rep(paste0("Gene.", 1:10), times=4),
               Comparison = rep(c("WT_vs_Mut1", "WT_vs_Mut2"), each=10, times=2),
               Test_method = rep(c("T-test", "MannWhitney"), each=20),
               P_value = runif(40))

#Perform operation manually
test_data %>% 
    #Start by binning the data according to q-value
    mutate(Probability.bin = cut(P_value,
                                 breaks = c(-Inf, seq(0.1, 1, by=0.1), Inf),
                                 labels = c(seq(0.0, 1.0, by=0.1)),
                                 right = FALSE)) %>% 
    #Now summarize the results by bin.
    count(Comparison, Test_method, Probability.bin) %>% 
    #Fill in any missing bins with 0 counts
    complete(nesting(Comparison, Test_method), Probability.bin,
             fill=list(n = 0))

#Create function that accepts bare column names
bin_by_p_value <- function(df,
                           pvalue_col, #Bare name of p-value column
                           ...) {      #Bare names of grouping columns

    #"Quote" column names so they are ready for use below
    pvalue_col_name <- enquo(pvalue_col)
    group_by_cols <- quos(...)

    #Perform the operation
    df %>% 
        #Start by binning the data according to q-value
        mutate(Probability.bin = cut(UQ(pvalue_col_name),
                                     breaks = c(-Inf, seq(0.1, 1, by=0.1), Inf),
                                     labels = c(seq(0.0, 1.0, by=0.1)),
                                     right = FALSE)) %>% 
        #Now summarize the results by bin.
        count(UQS(group_by_cols), Probability.bin) %>% 
        #Fill in any missing bins with 0 counts
        complete(nesting(UQS(group_by_cols)), Probability.bin,
                 # complete(nesting(UQS(group_by_cols)), Probability.bin,
                 fill=list(n = 0))
}

#Use function to perform operation
test_data %>% 
    bin_by_p_value(P_value, Comparison, Test_method)

当我手动执行操作时，一切都正常。当我使用该函数时，它会出现以下错误：

Overscope_eval_next中的错误(超限，扩展)：找不到对象“比较”

我已经将问题缩小到函数中的以下代码段：

complete(nesting(UQS(group_by_cols)), Probability.bin...

如果我删除了对nesting()的调用，代码就会执行而不会出现错误。但是，我想要维护这样的功能:我只使用原始数据中存在的分组变量的组合，然后得到所有可能的组合，这样我就可以填充所有丢失的回收箱。基于错误名称和失败的地方，我猜想这是一个范围/环境问题，在嵌套()中，我确实应该为分组变量使用一个不同的环境，因为它包含在要完成()的调用中。但是，对于dplyr编程来说，我还不够新，所以我不知道该如何做。

我试图通过将分组列合并为单个列，然后将united作为输入输入到complete()中来解决这一问题。这使我可以以我想要的方式执行完整()操作，同时避免嵌套()函数。但是，当我想要分离回原来的分组列时，我遇到了麻烦，因为我不知道如何将商列表转换为字符向量(分离()的“成”参数所需的)。下面是一些代码片段来说明我要说的内容：

        #Fill in any missing bins with 0 counts
        unite(Merged_grouping_cols, UQS(group_by_cols), sep="*") %>% 
        complete(Merged_grouping_cols, Probability.bin,
                 fill=list(n = 0)) %>%
        separate(Merged_grouping_cols, into=c("What goes here?"), sep="\\*")

以下是相关版本信息:r版本3.4.2 (2017-09-28)，tidyr_0.7.2，dplyr_0.7.4

我很感激任何解决办法，但我想知道我在做什么，那就是摩擦完全()和嵌套()错误的方式。

dplyr

rlang

回答 1

Stack Overflow用户

发布于 2021-06-02 04:11:32

使用卷卷{{}}表示pvalue_col。
将点(...)直接传递给count。
在ensyms中使用!!!和nesting。

bin_by_p_value <- function(df,
                           pvalue_col, #Bare name of p-value column
                           ...) {      #Bare names of grouping columns
  
  #Perform the operation
  df %>% 
    #Start by binning the data according to q-value
    mutate(Probability.bin = cut({{pvalue_col}},
                                 breaks = c(-Inf, seq(0.1, 1, by=0.1), Inf),
                                 labels = c(seq(0.0, 1.0, by=0.1)),
                                 right = FALSE)) %>% 
    #Now summarize the results by bin.
    count(..., Probability.bin) %>% 
    #Fill in any missing bins with 0 counts
    complete(nesting(!!!ensyms(...)), Probability.bin,   fill=list(n = 0))
}

test_data %>% bin_by_p_value(P_value, Comparison, Test_method)

# A tibble: 44 x 4
#   Comparison Test_method Probability.bin     n
#   <chr>      <chr>       <fct>           <dbl>
# 1 WT_vs_Mut1 MannWhitney 0                   1
# 2 WT_vs_Mut1 MannWhitney 0.1                 1
# 3 WT_vs_Mut1 MannWhitney 0.2                 0
# 4 WT_vs_Mut1 MannWhitney 0.3                 1
# 5 WT_vs_Mut1 MannWhitney 0.4                 1
# 6 WT_vs_Mut1 MannWhitney 0.5                 1
# 7 WT_vs_Mut1 MannWhitney 0.6                 0
# 8 WT_vs_Mut1 MannWhitney 0.7                 0
# 9 WT_vs_Mut1 MannWhitney 0.8                 1
#10 WT_vs_Mut1 MannWhitney 0.9                 4
# … with 34 more rows

如果手动调用的输出存储在res中，则测试输出。

identical(res, test_data %>% bin_by_p_value(P_value, Comparison, Test_method))
#[1] TRUE

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47211743

复制

相似问题

问dplyr编程:取消引号-剪接会导致带完全()和嵌套()的范围过大的错误。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问dplyr编程:取消引号-剪接会导致带完全()和嵌套()的范围过大的错误。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问dplyr编程:取消引号-剪接会导致带完全()和嵌套()的范围过大的错误。
EN