文章/答案/技术大牛

发布

社区首页 >问答首页 >从类"simple_triplet_matrix“到”矩阵“的转换

问从类"simple_triplet_matrix“到”矩阵“的转换
EN

Stack Overflow用户

提问于 2014-02-28 05:32:30

回答 2查看 4.1K关注 0票数 6

我正在尝试转换，下面是用TermDocumentMatrix()创建的tm包创建的简单三重态矩阵

A term-document matrix (317443 terms, 86960 documents)

Non-/sparse entries: 18472230/27586371050
Sparsity           : 100%
Maximal term length: 653 
Weighting          : term frequency (tf)

下课

[1] "TermDocumentMatrix"    "simple_triplet_matrix"

到稠密矩阵。

但

dense <- as.matrix(tdm)

生成错误

Error in vector(typeof(x$v), nr * nc) : vector size cannot be NA
In addition: Warning message:
In nr * nc : NAs produced by integer overflow

我不能真正理解错误和警告信息。尝试在小型数据集上复制错误

library(tm)
data("crude")
tdm <- TermDocumentMatrix(crude)
as.matrix(tdm)

不会产生同样的问题。我从this answer中看到了一个类似的问题是通过slam包解决的(尽管问题是关于和运算，而不是转换成稠密矩阵)。我浏览了slam文档，但找不到任何特定的函数将类simple_triplet_matrix的对象转换为类matrix的对象。

matrix

回答 2

Stack Overflow用户

回答已采纳

发布于 2014-02-28 08:31:59

您会得到一个错误，因为注释达到了整数限制的极限，这是正常的，因为您有大量的文档。这会复制错误：

as.integer(.Machine$integer.max+1)
[1] NA
Warning message:
NAs introduced by coercion

函数vector (以整数为参数)失败，因为它的第二个参数是NA。

一个解决方案是在不调用as.matrix.simple_triplet_matrix的情况下重新定义vector。例如：

as.matrix.simple_triplet_matrix <- 
function (x, ...) 
{
  nr <- x$nrow
  nc <- x$ncol
  ## old line: y <- matrix(vector(typeof(x$v), nr * nc), nr, nc)
  y <- matrix(0, nr, nc)  ## 
  y[cbind(x$i, x$j)] <- x$v
  dimnames(y) <- x$dimnames
  y
}

但是，我不确定强制使用这样的稀疏矩阵(100%)是个好主意。

编辑

一个想法是使用来自saparseMatrix包的Matrix。这里是一个例子，在这个例子中，我比较了每种强制所生成的对象。通过使用sparseMatrix，您至少可以获得10倍(我认为对于您非常稀疏的矩阵，您将获得更多)。此外，稀疏矩阵支持加法和乘法。

require(tm)
data("crude")
dtm <- TermDocumentMatrix(crude,
                          control = list(weighting = weightTfIdf,
                                         stopwords = TRUE))
library(Matrix)
Dense <- sparseMatrix(dtm$i,dtm$j,x=dtm$v)
dense <- as.matrix(dtm)
## check sizes 
floor(as.numeric(object.size(dense)/object.size(Dense)))
## addistion and multiplication are supported
Dense+Dense
Dense*Dense

票数 2

Stack Overflow用户

发布于 2014-12-13 01:51:40

我只是遇到了一个类似的问题。我不确定我的问题是否相同，但当将稀疏矩阵与稠密矩阵组合时，我得到了类似的错误消息NAs produced by integer overflow。我能够通过使用as.single将密集矩阵转换为单精度来修复它。我认为“溢出整数”是由sparseMatrix包中的操作造成的，这些操作在某种程度上截断了留下剩馀数字的双精度值。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/22087062

复制

相似问题

问从类"simple_triplet_matrix“到”矩阵“的转换
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从类"simple_triplet_matrix“到”矩阵“的转换EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从类"simple_triplet_matrix“到”矩阵“的转换
EN