问R文本挖掘-处理复数
EN

Stack Overflow用户

提问于 2016-01-22 10:33:03

回答 2查看 3.2K关注 0票数 1

我正在学习R中的文本挖掘，并取得了很好的成功。但是我被困在如何处理复数上了。也就是说，我希望“国家”和“国家”被算作同一个词，理想情况下，“字典”和“字典”被算作同一个词。

x <- '"nation" and "nations" to be counted as the same word and ideally "dictionary" and "dictionaries" to be counted as the same word.'

text-mining

回答 2

Stack Overflow用户

发布于 2016-01-22 10:49:45

一种可能的解决方案。在这里，我使用pacman包来使解决方案自包含：

if (!require("pacman")) install.packages("pacman"); library(pacman)
p_load_gh('hrbrmstr/pluralize')
p_load(quanteda)

x <- '"nation" and "nations" to be counted as the same word and ideally "dictionary" and "dictionaries"'
singularize(unlist(tokenize(x)))

##  [1] "\""         "nation"     "\""         "and"        "\""         "nation"     "\""        
##  [8] "to"         "be"         "counted"    "a"          "the"        "same"       "word"      
## [15] "and"        "ideally"    "\""         "dictionary" "\""         "and"        "\""        
## [22] "dictionary" "\""

票数 8

Stack Overflow用户

发布于 2021-04-06 15:45:49

SemNetCleaner包有一个奇异化函数。它比pluralize包慢，但我发现它对名词的处理更好。例如，Mars不会转换为Mar.

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/34938023

复制

相似问题

问R文本挖掘-处理复数
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R文本挖掘-处理复数EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R文本挖掘-处理复数
EN