我正在努力学习使用R中的lsa包,我正在使用比下面示例大得多的数据集,但这是为了可再现性(支持这个人在他的站点上发布这段代码,这是一个很好的资源)。
我收到一条奇怪的错误信息,似乎无法解决:
Error in Ops.simple_triplet_matrix(m, 1) : Incompatible dimensions. 下面是我正在修改的一些代码:
# load required libraries
library(tm)
library(ggplot2)
library(lsa)
library(SnowballC)
lsa <- function () {
# 1. Prepare mock data
text <- c("transporting food by cars will cause global warming. so we should go local.",
"we should try to convince our parents to stop using cars because it will cause global warming.",
"some food, such as mongo, requires a warm weather to grow. so they have to be transported to canada.",
"a typical Electronic Circuit can be built with a battery, a bulb, and a switch.",
"electricity flows from batteries to the bulb, just like water flows through a tube.",
"batteries have chemical energe in it. then electrons flow through a bulb to light it up.",
"birds can fly because they have feather and they are light.", "why some birds like pigeon can fly while some others like chicken cannot?",
"feather is important for birds' fly. if feather on a bird's wings is removed, this bird cannot fly.")
view <- factor(rep(c("view 1", "view 2", "view 3"), each = 3))
df <- data.frame(text, view, stringsAsFactors = FALSE)
# prepare corpus
corpus <- Corpus(VectorSource(df$text))
# corpus <- tm_map(corpus, tolower)
# corpus <- tm_map(corpus, removePunctuation)
# corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english")))
# corpus <- tm_map(corpus, stemDocument, language = "english")
corpus <- tm_map(corpus, PlainTextDocument)
# 2. MDS with raw term-document matrix compute distance matrix
td.mat <- TermDocumentMatrix(corpus)
td.mat.lsa <- lw_logtf(td.mat) * gw_idf(td.mat) # weighting
lsaSpace <- lsa(td.mat.lsa) # create LSA space
dist.mat.lsa <- dist(t(as.textmatrix(lsaSpace))) # compute distance matrix
return(dist.mat.lsa) # check distance matrix
}我可以生成没有问题的语料库,也可以把它转换成一个术语文档矩阵。当我定义dt.mat.lsa时会触发错误。
追溯如下:
4 stop("Incompatible dimensions.")
3 Ops.simple_triplet_matrix(m, 1)
2 lw_logtf(td.mat) at lsa.R#31
1 lsa() 因此,我的主要问题是:
谢谢你能在这里提供的任何帮助,这是我的第一篇帖子,所以我的问题质量反馈也是欢迎的!
发布于 2015-06-18 08:26:31
已经算出来了!
我将代码包装在'lsa‘函数调用中,并在函数正文中使用'lsa’作为变量名。因此,它具有不兼容的维度,因为lsa是在此环境中定义不同的函数。
哎呀!
https://stackoverflow.com/questions/30736756
复制相似问题