文章/答案/技术大牛

发布

社区首页 >问答首页 >情感分析词典词典

问情感分析词典词典
EN

Stack Overflow用户

提问于 2020-06-30 15:08:57

回答 1查看 636关注 0票数 0

我正试着在全德达做情感分析，我遇到了一个错误，我无法用2015年词典“情感词典”来解决。字典有四个键:否定、积极、消极(正词前面有否定(用来表达负面情绪)和否定(否定之前有否定，用来表达积极情绪)。

当我使用字典时，我无法激活最后两个类别。

这是我正在使用的脚本

包LexisNexisTools将其转换为一个quanteda语料库。当我尝试这个错误时，我并没有得到任何neg_pos或neg_negative的点击，所以我添加了一个示例句“这种咄咄逼人的政策不会赢得朋友”--从在quanteda页面上的参考到第一行文档都有一个neg_positive比例表('will not')。这是在第一个dfm中注册的，可以在toks_dict令牌列表中看到。然而，在语料库中有更多的完全相同的双标(将不会)的实例是不被计算的。此外，语料库中还有一些neg_pos和neg_neg短语根本没有注册。

我不知道这是怎么解决的。奇怪的是，在第三个dfm_dict中，初始的‘will not’根本没有注册为neg_positive。类别negative和positive的总体计数没有改变，因此这不是在其他地方计算丢失值的情况。我真的很想知道我做错了什么--任何帮助都是非常感谢的！

rm(list=ls())

library(quanteda)
library(quanteda.corpora)
library(readtext)
library(LexisNexisTools)
library(tidyverse)
library(RColorBrewer)

LNToutput <-lnt_read("word_labour.docx")

corp <- lnt_convert(LNToutput, to = "quanteda")

#uses the package lexisnexistools to create the corpus from the format needed


dfm <- dfm(corp, dictionary = data_dictionary_LSD2015)
dfm

toks_dict <- tokens_lookup(tokens(corp), dictionary = data_dictionary_LSD2015, exclusive= FALSE )
toks_dict

dfm_dict <- dfm(toks_dict, dictionary = data_dictionary_LSD2015, exclusive = FALSE )
dfm_dict

labour.DOCX?dl=0

这是与构成语料库原始文本的word文档的链接。

quanteda

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-07-02 15:09:47

对我来说很好。通过在复合字典键上运行kwic()，您可以看到匹配发生在哪里。

library("quanteda", warn.conflicts = FALSE)
## Package version: 2.1.0
## Parallel computing: 2 of 8 threads used.
## See https://quanteda.io for tutorials and examples.

corp <- readtext::readtext("https://www.dropbox.com/s/qdwetdn8bt9fdrd/word_labour.docx?dl=1") %>%
  corpus()

toks <- tokens(corp)

kwic(toks, pattern = data_dictionary_LSD2015["neg_positive"])
##                                                                                
##        [word_labour.docx, 82:83] Body This aggressive policy will |  not win  |
##    [word_labour.docx, 8468:8469]                manifesto as" as" | not worth |
##    [word_labour.docx, 9681:9682]       more high street services. | Not clear |
##    [word_labour.docx, 9778:9779]     will get one-to-one tuition. | Not clear |
##    [word_labour.docx, 9841:9842]      children free school meals. | Not clear |
##  [word_labour.docx, 10338:10339]      western Balkans and Turkey. | Not clear |
##  [word_labour.docx, 13463:13464]              in January. What is | not clear |
##                                   
##  friends. Ed Miliband has         
##  the paper it is written          
##  - Labour has criticised the      
##  - then shadow education secretary
##  - Labour appeared to back        
##  - this is not a                  
##  is if it allows a
kwic(toks, pattern = data_dictionary_LSD2015["neg_negative"])
##                                                                   
##  [word_labour.docx, 10772:10773] over again. It is | not unusual |
##                         
##  for voters to trust the

dfm反映了这一点：

tokens_lookup(toks, dictionary = data_dictionary_LSD2015) %>%
  dfm()
## Document-feature matrix of: 1 document, 4 features (0.0% sparse).
##                   features
## docs               negative positive neg_positive neg_negative
##   word_labour.docx      512      687            7            1

ps --我使用了readtext包来避免您所做的所有其他事情，这对于这个问题来说并不是必要的。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62660582

复制

相似问题

问情感分析词典词典
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问情感分析词典词典EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问情感分析词典词典
EN