文章/答案/技术大牛

发布

社区首页 >问答首页 >使用topicmodels (R)的LDA，如何在保留文档标题的情况下查看不同文档属于哪些主题？

问使用topicmodels (R)的LDA，如何在保留文档标题的情况下查看不同文档属于哪些主题？
EN

Stack Overflow用户

提问于 2018-01-30 14:14:36

回答 1查看 138关注 0票数 0

我很欣赏Ben在这里的回答: LDA使用topicmodel，我如何才能看到不同文档属于哪些主题？

我的问题是:如何保留最后一步中的文档标题？例如：

手动在单独的文本文件中创建三个.txt文档，并将它们存储在~桌面/自然语料库目录中

第一个文档标题: nature.txt

第一篇文件内容:名词自然界、自然母亲、地球母亲、环境；野生动植物、乡村；宇宙、宇宙。

第二个文档标题: conservation.txt

第二个文件内容:热带森林保护名词:保存、保护、维护、保管；护理、监护、畜牧、监督；维护、维护、修复、恢复；生态、环境保护。

第三个文档标题: bird.txt

第三个文件文本:喂鸟的名词:鸡；小鸡，雏鸟，雏鸟；非正式的带羽毛的朋友，小鸟；虎皮；(鸟类)技术鸟类。

#install.packages("tm")
#install.packages("topicmodels")
library(tm)
# Create DTM
#. The file path is a Mac file path.
corpus_nature_1 <- Corpus(DirSource("/Users/[home folder name]/Desktop/nature_corpus"),readerControl=list(reader=readPlain,language="en US")) 
corpus_nature_2 <- tm_map(corpus_nature_1,removeNumbers)
corpus_nature_3 <- tm_map(corpus_nature_2,content_transformer(tolower))
mystopwords <- c(stopwords(),"noun", "verb")
corpus_nature_4 <- tm_map(corpus_nature_3,removeWords, mystopwords)
corpus_nature_5 <- tm_map(corpus_nature_4,removePunctuation)
corpus_nature_6 <- tm_map(corpus_nature_5,stripWhitespace)
dtm_nature_1 <- DocumentTermMatrix(corpus_nature_6)

inspect(dtm_nature_1)
<<DocumentTermMatrix (documents: 3, terms: 42)>>
  Non-/sparse entries: 42/84
Sparsity           : 67%
Maximal term length: 16
Weighting          : term frequency (tf)
Sample             :
  Terms
Docs               avifauna birdie birds budgie chick feathered feeding fledgling fowl mother
bird.txt                1      1     2      1     1         1       1         1    1      0
conservation.txt        0      0     0      0     0         0       0         0    0      0
nature.txt              0      0     0      0     0         0       0         0    0      2

topic模型与topicmodel一起运行：

# Run topic model 2 topics
library(topicmodels)
topicmodels_LDA_nature_2 <- LDA(dtm_nature_1,2,method="Gibbs",control=list(seed=1),model=NULL)
terms(topicmodels_LDA_nature_2,3)
     Topic 1  Topic 2   
[1,] "birds"  "avifauna"
[2,] "mother" "birdie"  
[3,] "chick"  "budgie"

如何在此处保留文档标题(在inspect(dtm_nature_1)行中可见)？

# Create CSV Matrix 2 topics
matrix_nature_2 <- as.data.frame(topicmodels_LDA_nature_2@gamma)
names(matrix_nature_2) <- c(1:2)
write.csv(matrix_nature_2,"matrix_nature_2.csv")

#. Rows in this table are documents, columns are topics.
    1           2
1   0.46875     0.53125
2   0.52238806  0.47761194
3   0.555555556 0.444444444

谢谢。

topicmodels

回答 1

Stack Overflow用户

发布于 2018-02-02 01:27:43

我找到了这个变通方法，但如果有更好的解决方案，我将不胜感激。运行完上面的所有代码后，运行以下代码：

wordMatrix = as.data.frame( t(as.matrix(dtm_nature_1)) )
write.csv(wordMatrix,"dtm_nature_1.csv")

然后导入从该代码(来自上面)派生的CSV文件：

matrix_nature_2 <- as.data.frame(topicmodels_LDA_nature_2@gamma)
names(matrix_nature_2) <- c(1:2)
write.csv(matrix_nature_2,"matrix_nature_2.csv")

导入到excel中，然后将dtm_nature_1.csv导入到excel文件的第二个工作表中。然后，从dtm_nature_1.csv复制文档标题列表(列标题)并粘贴特殊内容，将它们转置到matrix_nature_2.csv表的一个清晰列中。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48515004

复制

相似问题

问使用topicmodels (R)的LDA，如何在保留文档标题的情况下查看不同文档属于哪些主题？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用topicmodels (R)的LDA，如何在保留文档标题的情况下查看不同文档属于哪些主题？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用topicmodels (R)的LDA，如何在保留文档标题的情况下查看不同文档属于哪些主题？
EN