使用TM,我将DocumentTermMatrix与字典列表进行比较,以计算总计:
totals <- inspect(DocumentTermMatrix(x, list(dictionary = d)))这对于单个单词来说很好,但是我想包括两个单词,但我不知道如何做到这一点。
我试过RWeka:
TrigramTokenizer <- function(x) NGramTokenizer(x,
Weka_control(min = 3, max = 3))
tdm <- TermDocumentMatrix(v.corpus,
control = list(tokenize = TrigramTokenizer))BUt得到以下错误消息:
Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), :
'i, j, v' different lengths
In addition: Warning messages:
1: In parallel::mclapply(x, termFreq, control) :
all scheduled cores encountered errors in user code
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
3: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), :
NAs introduced by coercion.你能帮我处理错误信息吗?
谢谢!!
发布于 2014-03-26 14:32:08
https://stackoverflow.com/questions/20577040
复制相似问题