我有一个这样的数据帧:
df = data.frame('name' = c('California parks', 'bear lake', 'beautiful tree house', 'banana plant'), 'extract' = c('parks', 'bear', 'tree', 'plant'))如何从name列中删除'extract‘列的字符串以获得以下结果:
name_new = California, lake, beautiful house, banana我怀疑这需要str_extract和lapply的组合,但可以很好地解决它。
谢谢!
发布于 2020-08-26 05:45:18
str_remove或str_replace对于string和pattern都是矢量化的。因此,如果我们有两个列,只需传递这些列'name','extract‘作为string,pattern就可以按元素删除'name’列中的子字符串。一旦我们删除了这些子串,就有机会在它们之前或之后添加空格,或者用带有trimws的str_replace替换它们(用来删除前导/滞后空格)
library(dplyr)
library(stringr)
df %>%
mutate(name_new = str_remove(name, extract),
name_new = str_replace_all(trimws(name_new), "\\s{2,}", " "))
# name extract name_new
#1 California parks parks California
#2 bear lake bear lake
#3 beautiful tree house tree beautiful house
#4 banana plant plant banana发布于 2020-08-26 05:56:04
使用gsub + Vectorize的基本R选项
within(df,name_new <- Vectorize(gsub)(paste0("\\s",extract,"\\s")," ",name))这给了我们
name extract name_new
1 California parks parks California
2 bear lake bear lake
3 beautiful tree house tree beautiful house
4 banana plant plant bananahttps://stackoverflow.com/questions/63587536
复制相似问题