我想知道text2vec包是否可以用于多标签分类,就像skmultilearn.problem_transform中的python BinaryRelevance一样。我目前指的是http://text2vec.org/vectorization.html上记录的管道。
发布于 2018-10-30 06:34:15
您可以使用text2vec创建文档术语矩阵(dtm)。要创建dtm,可以使用http://text2vec.org/vectorization.html。当您的dtm矩阵准备就绪时,您可以使用它们进行多标签分类。对于分类,xgboost模型是一个很好的模型,这在https://rpubs.com/mharris/multiclass_xgboost中进行了讨论。
library(xgboost)
# dtm_train is the training matrix obtained by text2vec
# dtm_test is the testing matrix obtained by text2vec
# label_train is labels for dtm_trian; should be factors
# label_train <- factor(label_train, labels = classes)
nclass <- 3 # how many classes you have
param <- list("objective" = "multi:softmax", # multi class classification
"num_class"= nclass , # Number of classes
"eval_metric" = "mlogloss", # evaluation metric
"nthread" = 8, # number of threads to be used
"max_depth" = 16, # maximum depth of tree
"eta" = 0.3, # step size shrinkage
"gamma" = 0, # minimum loss reduction
"subsample" = 0.7, # part of data instances
"colsample_bytree" = 1, # subsample ratio
"min_child_weight" = 12 # minimum sum of instance weight
)
bst = xgboost(
param=param,
data =as.matrix(dtm_train),
label = label_training,
nrounds=200)
# Make prediction on the testing data.
pred <- predict(bst, as.matrix(dtm_test))希望能有所帮助。
如果你需要进一步的解释,请告诉我。
https://stackoverflow.com/questions/52426597
复制相似问题