vocab
wordIDx V1
1 archive
2 name
3 atheism
4 resources
5 alt
wordIDx newsgroup_ID docIdx word/doc totalwords/doc totalwords/newsgroup wordID/newsgroup P(W_j)
1 1 196 3 1240 47821 2 0.028130269
1 1 47 2 1220 47821 2 0.028130269
2 12 4437 1 702 47490 8 0.8
3 12 4434 1 673 47490 8 0.035051912
5 12 4398 1 53 47490 8 0.4
3 12 4564 11 1539 47490 8 0.035051912对于wordIDx中的每个wordIDx,我需要计算以下公式:例如,wordIDx=1;我的值应该是
max(log(0.02813027)+sum(log(2/47821),log(2/47821)))
= -23.73506我现在有以下代码:
classifier_3$ans<- max(log(classifier_3$`P(W_j)`)+ (sum(log(classifier_3$`wordID/newsgroup`/classifier_3$`totalwords/newsgroup`))))我如何才能以这样一种方式循环:它考虑来自vocab的所有wordIDx并计算上面的例子,正如我突出显示的那样。
发布于 2019-02-22 14:02:04
就像这样,但是你真的需要清理你的列名。
vocab <- read.table(text = "wordIDx V1
1 archive
2 name
3 atheism
4 resources
5 alt", header = TRUE, stringsAsFactors = FALSE)
classifier_3 <- read.table(text = "wordIDx newsgroup_ID docIdx word/doc totalwords/doc totalwords/newsgroup wordID/newsgroup P(W_j)
1 1 196 3 1240 47821 2 0.028130269
1 1 47 2 1220 47821 2 0.028130269
2 12 4437 1 702 47490 8 0.8
3 12 4434 1 673 47490 8 0.035051912
5 12 4398 1 53 47490 8 0.4
3 12 4564 11 1539 47490 8 0.035051912", header = TRUE, stringsAsFactors = FALSE)
classifier_3 <- classifier_3[!duplicated(classifier_3$wordIDx), ]
classifier_3 <- merge(vocab, classifier_3, by = c("wordIDx"))
classifier_3$ans<- pmax(log(classifier_3$`P.W_j.`)+
(log(classifier_3$`wordID.newsgroup`/classifier_3$`totalwords.newsgroup`) +
# isn't that times 2?
log(classifier_3$`wordID.newsgroup`/classifier_3$`totalwords.newsgroup`)),
log(classifier_3$`wordID.newsgroup`/classifier_3$`totalwords.newsgroup`))https://stackoverflow.com/questions/54820629
复制相似问题