文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在quanteda中获得情感得分(并保留情感词)？

问如何在quanteda中获得情感得分(并保留情感词)？
EN

Stack Overflow用户

提问于 2020-05-27 15:35:14

回答 1查看 114关注 0票数 1

考虑一下这个简单的例子

library(tibble)
library(quanteda)

tibble(mytext = c('this is a good movie',
                  'oh man this is really bad',
                  'quanteda is great!'))

# A tibble: 3 x 1
  mytext                   
  <chr>                    
1 this is a good movie     
2 oh man this is really bad
3 quanteda is great!

我想做一些基本的情绪分析，但有点扭曲。这是我的字典，存储在一个普通的tibble中

mydictionary <- tibble(sentiment = c('positive', 'positive','negative'),
                       word = c('good', 'great', 'bad'))

# A tibble: 3 x 2
  sentiment word 
  <chr>     <chr>
1 positive  good 
2 positive  great
3 negative  bad

基本上，我想数一数，在每个句子中检测到多少个积极和消极的单词，但也要跟踪匹配的单词。换句话说，输出应该类似于

                          mytext nb.pos nb.neg   pos.words
1 this is a good and great movie      2      0 good, great
2      oh man this is really bad      0      1         bad
3             quanteda is great!      1      0       great

我怎样才能在quanteda中做到这一点？这个是可能的吗？谢谢!

quanteda

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-27 16:10:23

请继续关注quanteda v. 2.1，其中我们将大大扩展情感分析的专用功能。同时，见下文。请注意，我做了一些调整，因为您报告的文本和输入文本之间存在差异，而且您在pos.words中包含了所有的情感词，而不仅仅是积极的单词。下面，我计算正面和所有情绪匹配。

# note the amended input text
mytext <- c(
  "this is a good and great movie",
  "oh man this is really bad",
  "quanteda is great!"
)

mydictionary <- tibble::tibble(
  sentiment = c("positive", "positive", "negative"),
  word = c("good", "great", "bad")
)

library("quanteda", warn.conflicts = FALSE)
## Package version: 2.0.9000
## Parallel computing: 2 of 8 threads used.
## See https://quanteda.io for tutorials and examples.

# make the dictionary into a quanteda dictionary
qdict <- as.dictionary(mydictionary)

现在，我们可以使用查找函数来获得最终的data.frame。

# get the sentiment scores
toks <- tokens(mytext)
df <- toks %>%
  tokens_lookup(dictionary = qdict) %>%
  dfm() %>%
  convert(to = "data.frame")
names(df)[2:3] <- c("nb.neg", "nb.pos")

# get the matches for pos and all words
poswords <- tokens_keep(toks, qdict["positive"])
allwords <- tokens_keep(toks, qdict)

data.frame(
  mytext = mytext,
  df[, 2:3],
  pos.words = sapply(poswords, paste, collapse = ", "),
  all.words = sapply(allwords, paste, collapse = ", "),
  row.names = NULL
)
##                           mytext nb.neg nb.pos   pos.words   all.words
## 1 this is a good and great movie      0      2 good, great good, great
## 2      oh man this is really bad      1      0                     bad
## 3             quanteda is great!      0      1       great       great

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62046988

复制

相似问题

问如何在quanteda中获得情感得分(并保留情感词)？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在quanteda中获得情感得分(并保留情感词)？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在quanteda中获得情感得分(并保留情感词)？
EN