首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在quanteda中获得情感得分(并保留情感词)?

如何在quanteda中获得情感得分(并保留情感词)?
EN

Stack Overflow用户
提问于 2020-05-27 15:35:14
回答 1查看 114关注 0票数 1

考虑一下这个简单的例子

代码语言:javascript
复制
library(tibble)
library(quanteda)

tibble(mytext = c('this is a good movie',
                  'oh man this is really bad',
                  'quanteda is great!'))

# A tibble: 3 x 1
  mytext                   
  <chr>                    
1 this is a good movie     
2 oh man this is really bad
3 quanteda is great!   

我想做一些基本的情绪分析,但有点扭曲。这是我的字典,存储在一个普通的tibble

代码语言:javascript
复制
mydictionary <- tibble(sentiment = c('positive', 'positive','negative'),
                       word = c('good', 'great', 'bad'))

# A tibble: 3 x 2
  sentiment word 
  <chr>     <chr>
1 positive  good 
2 positive  great
3 negative  bad  

基本上,我想数一数,在每个句子中检测到多少个积极和消极的单词,但也要跟踪匹配的单词。换句话说,输出应该类似于

代码语言:javascript
复制
                          mytext nb.pos nb.neg   pos.words
1 this is a good and great movie      2      0 good, great
2      oh man this is really bad      0      1         bad
3             quanteda is great!      1      0       great

我怎样才能在quanteda中做到这一点?这个是可能的吗?谢谢!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-27 16:10:23

请继续关注quanteda v. 2.1,其中我们将大大扩展情感分析的专用功能。同时,见下文。请注意,我做了一些调整,因为您报告的文本和输入文本之间存在差异,而且您在pos.words中包含了所有的情感词,而不仅仅是积极的单词。下面,我计算正面和所有情绪匹配。

代码语言:javascript
复制
# note the amended input text
mytext <- c(
  "this is a good and great movie",
  "oh man this is really bad",
  "quanteda is great!"
)

mydictionary <- tibble::tibble(
  sentiment = c("positive", "positive", "negative"),
  word = c("good", "great", "bad")
)

library("quanteda", warn.conflicts = FALSE)
## Package version: 2.0.9000
## Parallel computing: 2 of 8 threads used.
## See https://quanteda.io for tutorials and examples.

# make the dictionary into a quanteda dictionary
qdict <- as.dictionary(mydictionary)

现在,我们可以使用查找函数来获得最终的data.frame。

代码语言:javascript
复制
# get the sentiment scores
toks <- tokens(mytext)
df <- toks %>%
  tokens_lookup(dictionary = qdict) %>%
  dfm() %>%
  convert(to = "data.frame")
names(df)[2:3] <- c("nb.neg", "nb.pos")

# get the matches for pos and all words
poswords <- tokens_keep(toks, qdict["positive"])
allwords <- tokens_keep(toks, qdict)

data.frame(
  mytext = mytext,
  df[, 2:3],
  pos.words = sapply(poswords, paste, collapse = ", "),
  all.words = sapply(allwords, paste, collapse = ", "),
  row.names = NULL
)
##                           mytext nb.neg nb.pos   pos.words   all.words
## 1 this is a good and great movie      0      2 good, great good, great
## 2      oh man this is really bad      1      0                     bad
## 3             quanteda is great!      0      1       great       great
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62046988

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档