文章/答案/技术大牛

发布

社区首页 >问答首页 >计算数据中句子中的字数。

问计算数据中句子中的字数。
EN

Stack Overflow用户

提问于 2020-07-13 13:11:18

回答 3查看 162关注 0票数 0

我有一个看起来有点像这样的数据集：

sentences <- c("sample text in sentence 1", "sample text in sentence 2")
id <- c(1,2) 

df <- data.frame(sentences, id)

我想要一个数字，在那里我可以看到某些重大事件的发生。所以让我说我有：

trigger_bg_1 <- "sample text"

我期望输出2(因为在两个句子中有两个“示例文本”出现)。我知道怎么做这样的单词计数：

trigger_word_sentence <- 0

for(i in 1:nrow(df)){
  words <- df$sentences[i]
  words = strsplit(words, " ")
  
  for(i in unlist(words)){ 
    if(i == trigger_word_sentence){
      trigger_word_sentence = trigger_word_sentence + 1
    }
  }
}

但我找不到什么东西能帮我搞定。对于如何修改代码以使其正常工作，有什么想法吗？

但是由于我需要对触发词进行长时间的测试，所以我需要计算一下。

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-07-13 13:40:28

如果您想要计算匹配的句子，可以使用grep

length(grep(trigger_bg_1, sentences, fixed = TRUE))
#[1] 2

如果您想要计算您找到trigger_bg_1的次数，您可以使用gregexpr

sum(unlist(lapply(gregexpr(trigger_bg_1, sentences, fixed = TRUE)
 , function(x) sum(x>0))))
#[1] 2

票数 1

Stack Overflow用户

发布于 2020-07-13 13:34:52

你可以sum一个grepl

sum(grepl(trigger_bg_1, df$sentences))
[1] 2

票数 0

Stack Overflow用户

发布于 2020-07-13 15:03:16

如果您真的对bigram感兴趣，而不仅仅是设置单词组合，那么quanteda包可以提供一个更充实和更系统的前进方向：

数据：

sentences <- c("sample text in sentence 1", "sample text in sentence 2")
id <- c(1,2) 
df <- data.frame(sentences, id)

解决方案：

library(quanteda)
# strip sentences down to words (removing punctuation):
words <- tokens(sentences, remove_punct = TRUE)
# make bigrams, tabulate them and sort them in decreasing order:
bigrams <- sort(table(unlist(as.character(tokens_ngrams(words, n = 2, concatenator = " ")))), decreasing = T)

结果：

bigrams
in sentence sample text     text in  sentence 1  sentence 2 
          2           2           2           1           1

如果您想检查某一特定信号的频率计数：

bigrams["in sentence"]
in sentence 
          2

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62876572

复制

相似问题

问计算数据中句子中的字数。
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计算数据中句子中的字数。EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计算数据中句子中的字数。
EN