文章/答案/技术大牛

发布

社区首页 >问答首页 >wordcloud package: get“strwidth错误(…)”：无效的'cex‘值“

问wordcloud package: get“strwidth错误(…)”：无效的'cex‘值“
EN

Stack Overflow用户

提问于 2013-12-03 13:58:18

回答 2查看 17.6K关注 0票数 5

我使用的是R 2.15.1中的tm和wordcloud包。我正在尝试创建一个单词云，下面是代码：

maruti_tweets = userTimeline("Maruti_suzuki", n=1000,cainfo="cacert.pem")
hyundai_tweets = userTimeline("HyundaiIndia", n=1000,cainfo="cacert.pem")
tata_tweets = userTimeline("TataMotor", n=1000,cainfo="cacert.pem")
toyota_tweets = userTimeline("Toyota_India", n=1000,cainfo="cacert.pem")
# get text
maruti_txt = sapply(maruti_tweets, function(x) x$getText())
hyundai_txt = sapply(hyundai_tweets, function(x) x$getText())
tata_txt = sapply(tata_tweets, function(x) x$getText())
toyota_txt = sapply(toyota_tweets, function(x) x$getText())
clean.text = function(x)

{
   # tolower
   x = tolower(x)
   # remove rt
   x = gsub("rt", "", x)
   # remove at
   x = gsub("@\\w+", "", x)
   # remove punctuation
   x = gsub("[[:punct:]]", "", x)
   # remove numbers
   x = gsub("[[:digit:]]", "", x)
   # remove links http
   x = gsub("http\\w+", "", x)
   # remove tabs
   x = gsub("[ |\t]{2,}", "", x)
   # remove blank spaces at the beginning
   x = gsub("^ ", "", x)
   # remove blank spaces at the end
   x = gsub(" $", "", x)
   return(x)
}
# clean texts
maruti_clean = clean.text(maruti_txt)
hyundai_clean = clean.text(hyundai_txt)
tata_clean = clean.text(tata_txt)
toyota_clean = clean.text(toyota_txt)
maruti = paste(maruti_clean, collapse=" ")
hyundai= paste(hyundai_clean, collapse=" ")
tata= paste(tata_clean, collapse=" ")
toyota= paste(toyota_clean, collapse=" ")
# put ehyundaiything in a single vector
all = c(maruti, hyundai, tata, toyota)
# remove stop-words
all = removeWords(all,
c(stopwords("english"), "maruti", "tata", "hyundai", "toyota"))
# create corpus
corpus = Corpus(VectorSource(all))
# create term-document matrix
tdm = TermDocumentMatrix(corpus)
# convert as matrix
tdm = as.matrix(tdm)
# add column names
colnames(tdm) = c("MARUTI", "HYUNDAI", "TATA", "TOYOTA")
# comparison cloud
comparison.cloud(tdm, random.order=FALSE,colors = c("#00B2FF", "red",     #FF0099","#6600CC"),max.words=500)

但是得到下面的错误

Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value
please help

回答 2

Stack Overflow用户

发布于 2013-12-03 16:24:31

你在TataMotors推特帐户中有一个打字错误。它应该拼写为“TataMotors”，而不是“TataMotors”。因此，术语矩阵中的一列是空的，当计算cex时，它被分配到NAN。

修复拼写错误，其余代码就可以正常工作了。祝好运!

票数 1

Stack Overflow用户

发布于 2016-01-07 16:42:29

我在另一个应用程序中发现了空列问题，抛出了相同的错误。在我的例子中，这是因为removeSparseTerms命令应用于文档术语矩阵。使用str()帮助我识别了错误。

输入变量(略有编辑)有289列：

> str(corpus.dtm)
List of 6
$ i       : int [1:443] 3 4 6 8 10 12 15 18 19 21 ...
$ j       : int [1:443] 105 98 210 93 287 249 126 223 129 146 ...
$ v       : num [1:443] 1 1 1 1 1 1 1 1 1 1 ...
$ nrow    : int 1408
$ ncol    : int 289
$ dimnames:List of 2
..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ...
..$ Terms: chr [1:289] "word1" "word2" "word3" "word4" ...
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency" "tf"

命令是：

removeSparseTerms(corpus.dtm,0.90)->corpus.dtm.frequent

并且结果有0列：

> str(corpus.dtm.frequent)
List of 6
$ i       : int(0) 
$ j       : int(0) 
$ v       : num(0) 
$ nrow    : int 1408
$ ncol    : int 0
$ dimnames:List of 2
..$ Docs : chr [1:1408] "character(0)" "character(0)" "character(0)" "character(0)" ...
..$ Terms: NULL
- attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix"
- attr(*, "weighting")= chr [1:2] "term frequency" "tf"

将稀疏系数从0.90提高到0.95解决了这个问题。对于更简洁的文档，我将其设置为0.999，以便在删除稀疏术语后得到非空的结果。

发生此错误时，检查空列是一件好事。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/20343941

复制

相似问题

问wordcloud package: get“strwidth错误(…)”：无效的'cex‘值“
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问wordcloud package: get“strwidth错误(…)”：无效的'cex‘值“EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问wordcloud package: get“strwidth错误(…)”：无效的'cex‘值“
EN