文章/答案/技术大牛

发布

社区首页 >问答首页 >提取特定单词前后的5个单词

问提取特定单词前后的5个单词
EN

Stack Overflow用户

提问于 2019-09-17 18:42:07

回答 3查看 1.9K关注 0票数 3

如何提取特定单词旁边的单词/句子？示例：

6月28日，简去电影院吃爆米花

我想选择“Jane”和-2，意思是：

“六月二十八日，简去了”

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-09-17 18:58:20

我们可以做个函数来帮忙。这可能会让它更有活力。

library(tidyverse)

txt <- "On June 28, Jane went to the cinema and ate popcorn"

grab_text <- function(text, target, before, after){
  min <- which(unlist(map(str_split(text, "\\s"), ~grepl(target, .x))))-before
  max <- which(unlist(map(str_split(text, "\\s"), ~grepl(target, .x))))+after

  paste(str_split(text, "\\s")[[1]][min:max], collapse = " ")
}

grab_text(text = txt, target = "Jane", before = 2, after  = 2)
#> [1] "June 28, Jane went to"

首先我们拆分句子，然后找出目标的位置，然后在(函数中指定的数字)之前或之后获取任何单词，最后我们将句子折叠回一起。

票数 3

Stack Overflow用户

发布于 2019-09-17 19:26:13

我有一个较短的版本，使用str_extract从stringr

library(stringr)
txt <- "On June 28, Jane went to the cinema and ate popcorn"
str_extract(txt,"([^\\s]+\\s+){2}Jane(\\s+[^\\s]+){2}")

[1] "June 28, Jane went to"

函数str_extract从字符串中提取模式。regex \\s代表空白，而[^\\s]是对它的否定，所以任何东西都不是空白。所以整个模式是Jane，它有一个前后两倍的空白，以及一些除了空格以外的东西。

它的优点是它已经向量化了，如果你有一个文本向量，你可以使用str_extract_all。

s <- c("On June 28, Jane went to the cinema and ate popcorn. 
          The next day, Jane hiked on a trail.",
       "an indeed Jane loved it a lot")

str_extract_all(s,"([^\\s]+\\s+){2}Jane(\\s+[^\\s]+){2}")

[[1]]
[1] "June 28, Jane went to"   "next day, Jane hiked on"

[[2]]
[1] "an indeed Jane loved it"

票数 3

Stack Overflow用户

发布于 2019-09-17 18:57:00

这应该是可行的：

stringr::str_extract(text, "(?:[^\\s]+\\s){5}Jane(?:\\s[^\\s]+){5}")

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57980257

复制

相似问题

问提取特定单词前后的5个单词
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问提取特定单词前后的5个单词EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问提取特定单词前后的5个单词
EN