我想抓住所有提到的“养老金”(资本不敏感,包括养老金,养老金领取者,但排除不相关的词,如“停职”。然而,我想把养老金排除在“工作部”之前,但我无法抓住整个说法。到目前为止,我已经:
sentences <- c("department of work and pensions", "and pensioners", "pensioners", "Pensions", "suspension")
try <- grepl("(?<!department of work and )^pension*", ignore.case = T, perl = T, sentences)
try有什么建议吗?
发布于 2022-04-08 16:26:50
我们可以用
grepl("\\bpension\\S+", sentences, ignore.case = TRUE) &
!grepl("department of work .*\\bpension\\S+", sentences, ignore.case = TRUE)发布于 2022-04-08 17:41:15
grep('(?<!department of work and )\\bpension', sentences,
value = TRUE, ignore.case = TRUE, perl = TRUE)
[1] "and pensioners" "pensioners" "Pensions" 发布于 2022-04-08 18:27:58
您可以使用单个模式,该模式将说明单词之间的任何空格,并且只在单词边界匹配pension:
sentences <- c("department of work and pensions", "and pensioners", "pensioners", "Pensions", "suspension")
grepl("\\bdepartment of work and \\w+(*SKIP)(*F)|\\bpension", ignore.case = T, perl = T, sentences)
## => [1] FALSE TRUE TRUE TRUE FALSE详细信息
\bdepartment of work and \w+ -单词边界\b,department of work and +空格+一个或多个单词字符(*SKIP)(*F) -省略到目前为止匹配的所有文本,然后从失败位置开始下一次匹配搜索| -或\bpension -单词边界\b和pension子字符串。https://stackoverflow.com/questions/71800310
复制相似问题