如何提取特定单词旁边的单词/句子?示例:
6月28日,简去电影院吃爆米花
我想选择“Jane”和-2,意思是:
“六月二十八日,简去了”
发布于 2019-09-17 18:58:20
我们可以做个函数来帮忙。这可能会让它更有活力。
library(tidyverse)
txt <- "On June 28, Jane went to the cinema and ate popcorn"
grab_text <- function(text, target, before, after){
min <- which(unlist(map(str_split(text, "\\s"), ~grepl(target, .x))))-before
max <- which(unlist(map(str_split(text, "\\s"), ~grepl(target, .x))))+after
paste(str_split(text, "\\s")[[1]][min:max], collapse = " ")
}
grab_text(text = txt, target = "Jane", before = 2, after = 2)
#> [1] "June 28, Jane went to"首先我们拆分句子,然后找出目标的位置,然后在(函数中指定的数字)之前或之后获取任何单词,最后我们将句子折叠回一起。
发布于 2019-09-17 19:26:13
我有一个较短的版本,使用str_extract从stringr
library(stringr)
txt <- "On June 28, Jane went to the cinema and ate popcorn"
str_extract(txt,"([^\\s]+\\s+){2}Jane(\\s+[^\\s]+){2}")
[1] "June 28, Jane went to"函数str_extract从字符串中提取模式。regex \\s代表空白,而[^\\s]是对它的否定,所以任何东西都不是空白。所以整个模式是Jane,它有一个前后两倍的空白,以及一些除了空格以外的东西。
它的优点是它已经向量化了,如果你有一个文本向量,你可以使用str_extract_all。
s <- c("On June 28, Jane went to the cinema and ate popcorn.
The next day, Jane hiked on a trail.",
"an indeed Jane loved it a lot")
str_extract_all(s,"([^\\s]+\\s+){2}Jane(\\s+[^\\s]+){2}")
[[1]]
[1] "June 28, Jane went to" "next day, Jane hiked on"
[[2]]
[1] "an indeed Jane loved it"发布于 2019-09-17 18:57:00
这应该是可行的:
stringr::str_extract(text, "(?:[^\\s]+\\s){5}Jane(?:\\s[^\\s]+){5}")https://stackoverflow.com/questions/57980257
复制相似问题