我在nltk中使用nltk查找文件中每一行的情感。我有两个问题:
vader_lexicon.txt中添加一些单词,其语法如下:袭击-2.5 0.92195 -1,-3,-3,-3,-4,-3,-1、-2、-2、-3
-2.5和0.92195 [-1, -3, -3, -3, -4, -3, -1, -2, -2, -3]代表什么?
我该如何将它编码成一个新单词?假设我必须添加像'100%','A1'这样的东西。
nltk_data\corpora\opinion_lexicon文件夹中看到正负性单词txt。这些是如何被利用的?我也可以在这些txt文件中添加我的单词吗?发布于 2018-07-29 18:06:23
我认为维德在对文本进行分类时只使用了第一个词和第一个值。如果要添加新单词,只需创建一个单词及其情感值字典,即可使用update函数添加该字典:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
Analyzer = SentimentIntensityAnalyser()
Analyzer.lexicon.update(your_dictionary)您可以根据感知到的情感强度手动分配带有情感值的单词,或者如果这是不实际的,则可以在这两个类别中指定一个宽泛的值(例如-1.5和1.5)。
您可以使用此脚本(而不是我的脚本)检查您的更新是否包括:
import nltk
from nltk.tokenize import word_tokenize, RegexpTokenizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd
Analyzer = SentimentIntensityAnalyzer()
sentence = 'enter your text to test'
tokenized_sentence = nltk.word_tokenize(sentence)
pos_word_list=[]
neu_word_list=[]
neg_word_list=[]
for word in tokenized_sentence:
if (Analyzer.polarity_scores(word)['compound']) >= 0.1:
pos_word_list.append(word)
elif (Analyzer.polarity_scores(word)['compound']) <= -0.1:
neg_word_list.append(word)
else:
neu_word_list.append(word)
print('Positive:',pos_word_list)
print('Neutral:',neu_word_list)
print('Negative:',neg_word_list)
score = Analyzer.polarity_scores(sentence)
print('\nScores:', score)在更新vader之前:
sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'
Positive: []
Neutral: ['stocks', 'were', 'volatile', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'calamities', 'in', 'the', 'Chinese', 'markets']
Negative: []
Scores: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}在用基于金融的词典更新vader之后:
Analyzer.lexicon.update(Financial_Lexicon)
sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'
Positive: []
Neutral: ['stocks', 'were', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'in', 'the', 'Chinese', 'markets']
Negative: ['volatile', 'calamities']
Scores: {'neg': 0.294, 'neu': 0.706, 'pos': 0.0, 'compound': -0.6124}https://stackoverflow.com/questions/51514208
复制相似问题