继续我的追求,尝试和做我能做的一切,在基R内的潮汐。我希望在数据集中拆分一个字符串变量,提取结果向量的一个元素并将其插入到同一数据集中的第二个变量中。
我可以在R基础上做得很容易
df <- data.frame(specCond = paste0("cond_",c("cancer", "anxiety", "gastro"), "_", rep(letters[1:3], times = 3)), stringsAsFactors = F)
df$genCond <- sapply(df$specCond, function (i) strsplit(i, "_")[[1]][2])
df
# output
specCond genCond
# 1 cond_cancer_a cancer
# 2 cond_anxiety_b anxiety
# 3 cond_gastro_c gastro
# 4 cond_cancer_a cancer
# 5 cond_anxiety_b anxiety
# 6 cond_gastro_c gastro
# 7 cond_cancer_a cancer
# 8 cond_anxiety_b anxiety
# 9 cond_gastro_c gastro但是,当我试图在dplyr()中使用dplyr()做类似的事情时,它不起作用。
library(dplyr)
df2 <- data.frame(specCond = paste0("cond_",c("cancer", "anxiety", "gastro"), "_", rep(letters[1:3], times = 3)), stringsAsFactors = F) %>%
mutate(genCond = strsplit(specCond, "_")[[1]][2])
df2
# specCond genCond
# 1 cond_cancer_a cancer
# 2 cond_anxiety_b cancer
# 3 cond_gastro_c cancer
# 4 cond_cancer_a cancer
# 5 cond_anxiety_b cancer
# 6 cond_gastro_c cancer
# 7 cond_cancer_a cancer
# 8 cond_anxiety_b cancer
# 9 cond_gastro_c cancer感谢你的任何帮助
发布于 2019-08-21 02:11:57
由于sapply是一个循环,您需要在mutate中使用另一个循环来遍历每个specCond,拆分并选择第二个元素。你可以用purrr::map_chr
library(dplyr)
df %>%
mutate(genCond = purrr::map_chr(specCond, ~strsplit(., "_")[[1]][2]))
# specCond genCond
#1 cond_cancer_a cancer
#2 cond_anxiety_b anxiety
#3 cond_gastro_c gastro
#4 cond_cancer_a cancer
#5 cond_anxiety_b anxiety
#6 cond_gastro_c gastro
#7 cond_cancer_a cancer
#8 cond_anxiety_b anxiety
#9 cond_gastro_c gastro或者添加rowwise,在默认情况下为每一行执行此操作(但这可能比较慢)
df %>%
rowwise() %>%
mutate(genCond = strsplit(specCond, "_")[[1]][2]) 另一种方法可以是使用tidyr::extract在下划线之间获取一个单词。
tidyr::extract(df, specCond, "genCond", regex = ".*_(.*)_.*", remove = FALSE)发布于 2019-08-21 02:13:40
以下内容适合我(使用sub而不是strsplit):
df %>%
mutate(genCond = sub("^cond_([a-z]*)_[a-c]{1}$", "\\1", specCond))https://stackoverflow.com/questions/57583561
复制相似问题