我想创建一个TDM文本与特定的句子(两个或更多的单词组合),而不是单个单词。例如,句子可以是"climate change"、"global worming"、"lad use"等。我看到的例子都是单个单词。
tabela = DocumentTermMatrix(textolimpo,
list(dictionary = c("climate change","global worming","land use")))如果有人能帮我,我将不胜感激。
干杯。
拉斐尔
发布于 2016-04-07 04:50:45
我推荐quanteda:
library(quanteda)
textolimpo <- c("This climate change concerns me. This climate changes", "Wormed: global worming increased")
(dfm <- dfm(textolimpo,
ngrams=2L,
dictionary = list(climate="climate_change",
warm="global_worming"),
valuetype = "regex"))
# 2 x 2 sparse Matrix of class "dfmSparse"
# features
# docs climate warm
# text1 2 0
# text2 0 1
(dfm <- dfm(textolimpo,
ngrams=2L,
thesaurus = list(climate="climate_change",
warm="global_worming"),
valuetype = "regex"))
# 2 x 8 sparse Matrix of class "dfmSparse"
# this_climate change_concerns concerns_me me_this wormed_global worming_increased CLIMATE WARM
# text1 2 1 1 1 0 0 2 0
# text2 0 0 0 0 1 1 0 1https://stackoverflow.com/questions/36461869
复制相似问题