我有一张表,看起来像这样:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2))我想要计算每个模型中除intercept,month,ratediff之外的变量数量。我想要的输出是:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2), variables = c(3,3,3,3,3,3,3,3,3,3,3,3))我尝试使用以下命令获取标志:
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c(grep("month", x), "RateDiff")), na.rm = T))但是grep(month)不能工作。
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c("month", "RateDiff")), na.rm = T))这是可行的,但是后缀后面的月份不会被捕获。
我希望在变量intercept、month和RateDiff上使用等同于~ilike~ from sql的东西,因为我不希望它区分大小写,并且希望允许在变量上使用后缀和前缀。我怎样才能做到这一点呢?
发布于 2019-06-08 06:02:52
这里有一种使用dplyr的方法-
modelsummary %>%
mutate(
variables = term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
n_distinct()
)
term mod_id variables
1 (Intercept) 1 3
2 month1 1 3
3 month2 1 3
4 RateDiff 1 3
5 var1 1 3
6 var2 1 3
7 var3 1 3
8 (Intercept) 2 3
9 month1 2 3
10 var1 2 3
11 var2 2 3
12 var3 2 3或者使用dplyr和stringr
modelsummary %>%
mutate(
variables = str_subset(tolower(term), "intercept|month|ratediff", TRUE) %>%
n_distinct()
)如果要计算每个mutate的变量数量,请在mod_id之前添加group_by(mod_id)。
在R基中-
modelsummary$variables <- with(modelsummary,
term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
unique() %>% length()
)https://stackoverflow.com/questions/56501531
复制相似问题