文章/答案/技术大牛

发布

社区首页 >问答首页 >R:如何将数值变量添加到稀疏矩阵？

问R:如何将数值变量添加到稀疏矩阵？
EN

Stack Overflow用户

提问于 2017-06-08 08:36:42

回答 1查看 172关注 0票数 1

考虑下面的例子

library(text2vec)
library(glmnet)
library(dplyr)

dataframe <- data_frame(id = c(1,2,3,4),
                        text = c("this is a test", "this is another",'hello','what???'),
                        value = c(200,400,120,300),
                        output = c('win', 'lose','win','lose'))

> dataframe
# A tibble: 4 × 4
     id            text value output
  <dbl>           <chr> <dbl>  <chr>
1     1  this is a test   200    win
2     2 this is another   400   lose
3     3           hello   120    win
4     4         what???   300   lose

现在，我可以使用优秀的text2vec来获得与text列相对应的稀疏矩阵。要做到这一点，我只需要遵循text2vec教程：

it_train = itoken(dataframe$text, 
                  ids = dataframe$id, 
                  progressbar = FALSE)

vocab = create_vocabulary(it_train)
vectorizer = vocab_vectorizer(vocab)
dtm_train = create_dtm(it_train, vectorizer)

> dtm_train
4 x 7 sparse Matrix of class "dgCMatrix"
  hello another what??? a is test this
1     .       .       . 1  1    1    1
2     .       1       . .  1    .    1
3     1       .       . .  .    .    .
4     .       .       1 .  .    .    .

该dtm稀疏矩阵可以被馈送到ML模型中。但我的问题是:如何也使用value变量？

也就是说，作为glmnet或xgboost中的输入预测器，我希望使用我的稀疏矩阵(来自文本变量)，但也使用包含一些有价值信息的value变量。我该怎么做呢？我们可以以某种方式将信息添加到稀疏矩阵中吗？

谢谢!

machine-learning

r-caret

text-classification

text2vec

回答 1

Stack Overflow用户

发布于 2020-02-26 14:53:57

您可以使用sparse.hstacks

import numpy as np
from scipy.sparse import hstack

dtm_train = hstack((dtm_train,np.array(dataframe['value'])[:,None]))

请记住，您必须对您的保持数据执行类似的操作！

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/44424849

复制

相似问题

问R:如何将数值变量添加到稀疏矩阵？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R:如何将数值变量添加到稀疏矩阵？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R:如何将数值变量添加到稀疏矩阵？
EN