我刚刚在我的数据集中进行了Vader情绪分析:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk import tokenize
sid = SentimentIntensityAnalyzer()
for sentence in filtered_lines2:
print(sentence)
ss = sid.polarity_scores(sentence)
for k in sorted(ss):
print('{0}: {1}, '.format(k, ss[k]), )
print()这里是我的研究结果的样本:
Are these guests on Samsung and Google event mostly Chinese Wow Theyre
boring
Google Samsung
('compound: 0.3612, ',)
()
('neg: 0.12, ',)
()
('neu: 0.681, ',)
()
('pos: 0.199, ',)
()
Adobe lose 135bn to piracy Report
('compound: -0.4019, ',)
()
('neg: 0.31, ',)
()
('neu: 0.69, ',)
()
('pos: 0.0, ',)
()
Samsung Galaxy Nexus announced
('compound: 0.0, ',)
()
('neg: 0.0, ',)
()
('neu: 1.0, ',)
()
('pos: 0.0, ',)
()我想知道多少倍的“复合”等于,大于还是小于零。
我知道这可能很容易,但我对Python和一般的编码非常陌生。我尝试了很多不同的方法来创造我需要的东西,但我找不到任何解决方案。
(如果“结果样本”不正确,请编辑我的问题,因为我不知道如何写)
发布于 2016-09-29 12:00:44
到目前为止,这并不是最常用的pythonic方法,但是我认为如果您对python没有太多的经验,这将是最容易理解的。实际上,您可以创建一个包含0值的字典,并在每一种情况下增加该值。
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk import tokenize
sid = SentimentIntensityAnalyzer()
res = {"greater":0,"less":0,"equal":0}
for sentence in filtered_lines2:
ss = sid.polarity_scores(sentence)
if ss["compound"] == 0.0:
res["equal"] +=1
elif ss["compound"] > 0.0:
res["greater"] +=1
else:
res["less"] +=1
print(res)发布于 2016-09-29 11:58:38
您可以为每个类使用一个简单的计数器:
positive, negative, neutral = 0, 0, 0然后,在句子循环中,测试复合值并增加相应的计数器:
...
if ss['compound'] > 0:
positive += 1
elif ss['compound'] == 0:
neutral += 1
elif ...等。
发布于 2016-09-29 12:13:55
我可能定义一个函数,它返回由文档表示的不平等类型:
def inequality_type(val):
if val == 0.0:
return "equal"
elif val > 0.0:
return "greater"
return "less"然后在所有句子的复合分数上使用这一点来增加相应的不等式类型的计数。
from collections import defaultdict
def count_sentiments(sentences):
# Create a dictionary with values defaulted to 0
counts = defaultdict(int)
# Create a polarity score for each sentence
for score in map(sid.polarity_scores, sentences):
# Increment the dictionary entry for that inequality type
counts[inequality_type(score["compound"])] += 1
return counts然后你可以在过滤过的线路上调用它。
但是,只要使用collections.Counter就可以避免这种情况。
from collections import Counter
def count_sentiments(sentences):
# Count the inequality type for each score in the sentences' polarity scores
return Counter((inequality_type(score["compound"]) for score in map(sid.polarity_scores, sentences)))https://stackoverflow.com/questions/39767603
复制相似问题