我有一些包含电影评论的文本文件,我需要知道评论是好的还是坏的。我尝试了以下代码,但它不起作用:
import nltk
with open("c:/users/user/desktop/datascience/moviesr/movies-1-32.txt", 'r') as m11:
mov_rev = m11.read()
mov_review1=nltk.word_tokenize(mov_rev)
bon="crap aweful horrible terrible bad bland trite sucks unpleasant boring dull moronic dreadful disgusting distasteful flawed ordinary slow senseless unoriginal weak wacky uninteresting unpretentious "
bag_of_negative_words=nltk.word_tokenize(bon)
bop="Absorbing Big-Budget Brilliant Brutal Charismatic Charming Clever Comical Dazzling Dramatic Enjoyable Entertaining Excellent Exciting Expensive Fascinating Fast-Moving First-Rate Funny Highly-Charged Hilarious Imaginative Insightful Inspirational Intriguing Juvenile Lasting Legendary Pleasant Powerful Ripping Riveting Romantic Sad Satirical Sensitive Sentimental Surprising Suspenseful Tender Thought Provoking Tragic Uplifting Uproarious"
bop.lower()
bag_of_positive_words=nltk.word_tokenize(bop)
vec=[]
for i in bag_of_negative_words:
if i in mov_review1:
vec.append(1)
else:
for w in bag_of_positive_words:
if w in moview_review1:
vec.append(5)因此,我想看看检讨是否包含正面或负面的字眼。如果它包含一个否定词,那么一个值1将被赋值给向量vec,否则,一个值5将被赋值。但是我得到的输出是一个空向量。
请帮帮忙。另外,请建议其他解决这个问题的方法。
发布于 2014-12-02 20:07:14
试着从google在这个链接谷歌官方公布的坏话清单中发布的官方“坏话”数据库中搜索。此外,这里是好词不是正式的好话清单的链接
对于代码,我会这样做:
textArray = file('dir_to_your_text','r').read().split()
#Bad words should be listed like this for the split function to work
# "*** ****** **** ****" the stars are for the cenzuration :P
badArray = file('dir_to_your_bad_word_file).read().split()
goodArray = file('dir_to_your_good_word_file).read().split()
# Then you use matching algorithm from difflib on good and bad word for every word in an array of words
import difflib
goodMachingCouter = 0;
badMacihngCouter = 0;
for iGood in range(0, len(goodArray)):
for iWord in range(0, len(textArray)):
goodMachingCounter += difflib.SequenceMatcher(None, goodArray[iGood], textArray[iWord]).ratio()
for iBad in range(0, len(badArray)):
for iWord in range(0, len(textArray)):
badMachingCounter += difflib.SequenceMatcher(None, badArray[ibad], textArray[iWgoodord]).ratio()
goodMachingCouter *= 100/(len(goodArray)*len(textArray))
badMacihngCouter *= 100/(len(badArray)*len(textArray))
print('Show the good measurment of the text in %: '+goodMachingCouter)
print('Show the bad measurment of the text in %: '+badMacihngCouter)
print('Show the hootnes of the text: ' + len(textArray)*goodMachingCounter)代码将是缓慢但准确的:)我没有运行和测试它,请为我做它,并发布正确的代码:)因为我也想测试它:)
发布于 2014-12-07 16:17:20
下面的链接包含了在-5,5量表上的积极和消极的两极分化情绪的列表。只要试着根据单词匹配来计算分数,你就可以得到电影评论的整体分数。
AFINN
发布于 2014-12-01 09:58:00
试一试
vec =[]
for word in bag_of_negative_words:
if word in mov_review1:
vec.append(1)
for word in bag_of_positive_words:
if word in moview_review1:
vec.append(5)https://datascience.stackexchange.com/questions/2568
复制相似问题