首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >统计错误+合并Tidytext中的bing情绪得分变量?

统计错误+合并Tidytext中的bing情绪得分变量?
EN

Stack Overflow用户
提问于 2022-02-01 23:42:00
回答 1查看 63关注 0票数 0

我在对大量的文本进行情感分析。我在tidytext中使用bing词汇表来获得简单的二进制pos/neg分类,但想要计算文档中正字与总词(正和负)的比率。我对dplyr工作流感到生疏,但我想计算编码为“正”的单词数,并将其除以与情绪分类的单词总数。

我尝试了这种方法,使用了示例代码和备用数据。。。

代码语言:javascript
复制
library(tidyverse)
library(tidytext)

#Creating a fake tidytext corpus
df_tidytext <- data.frame(
  doc_id = c("Iraq_Report_2001", "Iraq_Report_2002"),
  text = c("xxxx", "xxxx") #Placeholder for text
)

#Creating a fake set of scored words with bing sentiments 
#for each doc in corpus
df_sentiment_bing <- data.frame(
  doc_id = c((rep("Iraq_Report_2001", each = 3)), 
             rep("Iraq_Report_2002", each = 3)),
  word = c("improve", "democratic", "violence",
           "sectarian", "conflict", "insurgency"),
  bing_sentiment = c("positive", "positive", "negative",
                "negative", "negative", "negative") #Stand-ins for sentiment classification
)

#Summarizing count of positive and negative words
# (number of positive words out of total scored words in each doc)
df_sentiment_scored <- df_tidytext %>%
  left_join(df_sentiment_bing) %>%
  group_by(doc_id) %>%
  count(bing_sentiment) %>%
  pivot_wider(names_from = bing_sentiment, values_from = n) %>%
  summarise(bing_score = count(positive)/(count(negative) + count(positive)))

但我得到了以下错误:

代码语言:javascript
复制
"Error: Problem with `summarise()` input `bing_score`.
x no applicable method for 'count' applied to an object of class "c('integer', 'numeric')"
ℹ Input `bing_score` is `count(positive)/(count(negative) + count(positive))`.
ℹ The error occurred in group 1: doc_id = "Iraq_Report_2001".

我想了解一下我在这里总结工作流程的错误之处。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-02 00:38:52

如果列是数字的,我不明白在那里计数有什么意义。顺便说一句,这也是你犯错误的原因。

一种解决办法可以是:

代码语言:javascript
复制
#Summarizing count of positive and negative words
# (number of positive words out of total scored words in each doc)
 df_tidytext %>%
  left_join(df_sentiment_bing) %>%
  group_by(doc_id) %>%
  dplyr::count(bing_sentiment) %>%
  pivot_wider(names_from = bing_sentiment, values_from = n) %>%
  replace(is.na(.), 0) %>%
  summarise(bing_score = sum(positive)/(sum(negative) + sum(positive)))

您应该得到的结果是:

代码语言:javascript
复制
Joining, by = "doc_id"
# A tibble: 2 × 2
  doc_id           bing_score
  <fct>                 <dbl>
1 Iraq_Report_2001      0.667
2 Iraq_Report_2002      0    
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70949018

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档