试图找到返回R中前三个单词的方法,我尝试了string_r中的单词函数,但只有当句子中至少有三个单词时,它才返回前三个单词。例如,
sentences <- c("Jane saw a cat", "Jane sat down", "Jane sat", "Jane")
word(sentences, 1, 3)返回Jane saw a、Jane sat down、NA、NA
我希望它能返回前三个单词,即使这个句子有一两个单词。所以我要找的输出是:
返回Jane saw a、Jane sat down、Jane Sat、Jane
发布于 2021-09-16 23:52:25
1)字符串计数输入的每个组件中的字数,并使用该或3 (以较少者为准)作为返回的字数。
library(stringr)
word(sentences, end = pmin(str_count(sentences, "\\w+"), 3))
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane" 2)字符串解决方案2将一些虚拟词附加到末尾,取前3个单词并修剪掉剩下的任何假人。
sentences %>%
str_c("@ @ @") %>%
word(end = 3) %>%
str_replace(" *@.*", "")
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane" 3a) Base (与(1)相同的思想可以被翻译成R基,如下所示:
Word <- function(x, end) do.call("paste", read.table(text = x, fill = TRUE)[1:end])
unname(Vectorize(Word)(sentences, end = pmin(lengths(strsplit(sentences, " ")), 3)))
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane" 3b) (与(2)相同的思想可以像这样转化为基R。Word来自(3a)。
sentences |>
paste("@ @ @") |>
Word(end = 3) |>
sub(pattern = " *@.*", replacement = "")
## [1] "Jane saw a" "Jane sat down" "Jane sat" "Jane"更新
(1)简化,旧的(1)现在(2)。(3a)和(3b)现在是R基地对应方。
发布于 2021-09-16 23:12:53
我们可以分道扬镳
sapply(strsplit(sentences, " "), \(x) paste(head(x, 3), collapse=" "))-output
[1] "Jane saw a" "Jane sat down" "Jane sat" "Jane" 或者使用正则表达式
trimws( sub("^((\\w+\\s+){1,3}).*", "\\1", sentences))-output
[1] "Jane saw a" "Jane sat" "Jane" "Jane" 如果我们想使用word,那么它可能需要一个coalesce
library(stringr)
library(purrr)
library(dplyr)
map(3:1, word, string = sentences, start = 1) %>%
exec(coalesce, !!!.)
[1] "Jane saw a" "Jane sat down" "Jane sat" "Jane" https://stackoverflow.com/questions/69216111
复制相似问题