文章/答案/技术大牛

发布

社区首页 >问答首页 >程序设计概念

问程序设计概念
EN

Stack Overflow用户

提问于 2013-03-04 13:15:53

回答 3查看 142关注 0票数 1

我想要制作一个程序，用点系统对垃圾邮件进行排序。

在邮件里写了几句话，

我想让程序给我的程序中的每个单词不同的分数，在我的程序中被归类为“垃圾词”，我也为不同的单词分配了不同的分数，这样每个单词都值一些分数。

我的伪码：

从文件中读取文本
寻找“垃圾词”

- for each word that comes up give the point the word is worth.

如果每个垃圾单词的总分为10分，则打印“垃圾邮件”，后面是文件中的单词列表，并将其归类为废词及其要点。

示例(文本文件)：

Hello!  
Do you have trouble sleeping? 
Do you need to rest?
Then dont hesitate call us for the absolute solution- without charge!

因此，当程序运行并分析上面的文本时，应该如下所示：

SPAM 14p
trouble 6p
charge 3p 
solution 5p

所以我打算用这种方式来写：

class junk(object):
    fil = open("filnamne.txt","r")
    junkwords = {"trouble":"6p","solution":"3p","virus":"4p"}
    words = junkwords

    if words in fil:
        print("SPAM")
    else:
        print("The file doesn't contain any junk")

所以，我现在的问题是，我如何给我的名单中的每个单词，在文件中的分数？

以及如何把总点数加在一起，以便if total_points are > 10然后程序应该print "SPAM"，

后面是文件中的“废词”列表和每个单词的总分。

data-structures

python-3.x

string-matching

回答 3

Stack Overflow用户

回答已采纳

发布于 2013-03-04 13:31:14

下面是一个可以让您接近的快速脚本：

MAXPOINTS = 10
JUNKWORDS={"trouble":6,"solution":5,"charge":3,"virus":7}
fil = open("filnamne.txt", "r")

foundwords = {}

points = 0

for word in fil.read().split():
   if word in JUNKWORDS:
       if word not in foundwords:
           foundwords[word] = 0
       points += JUNKWORDS[word]
       foundwords[word] += 1

if points > 10:
    print "SPAM"
    for word in foundwords:
        print word, foundwords[word]*JUNKWORDS[word]
else:
    print "The file doesn't contain any junk"

您可能希望对这些单词使用.lower()，并将所有字典键设置为小写。也可以删除所有非字母数字字符。

票数 0

Stack Overflow用户

发布于 2013-03-04 13:38:06

以下是另一种方法：

from collections import Counter

word_points = {'trouble': 6, 'solution': 5, 'charge': 3, 'virus': 7}

words = []

with open('ham.txt') as f:
   for line in f:
      if line.strip(): # weed out empty lines
         for word in line.split():
             words.append(word)

count_of_words = Counter(words)

total_points = {}
for word in word_points:
    if word in count_of_words:
       total_points[word] = word_points[word] * count_of_words[word]

if sum(i[0] for i in total_points.iteritems()) > 10:
   print 'SPAM {}'.format(sum(i[0] for i in total_points.iteritems()))
   for i in total_points.iteritems():
      print 'Word: {} Points: {}'.format(*i)

您可以做一些优化，但是它应该给您一个一般逻辑的概念。Counter可从Python2.7及更高版本获得。

票数 0

Stack Overflow用户

发布于 2013-03-04 13:46:57

我假设每个单词都有不同的点，所以我使用了一个字典。

您需要在文件中找到单词在单词中出现的次数。

您应该将每个单词的点存储为整数。不是'6p'或'4p'

所以，试试这个：

def find_junk(filename):
    word_points = {"trouble":6,"solution":3,"charge":2,"virus":4}
    word_count = {word:0 for word in word_points}
    count = 0
    found = []
    with open(filename) as f:
        for line in f:
            line = line.lower()
            for word in word_points:
                c = line.count(word)
                if c > 0:
                    count += c * word_points[word]
                    found.append(word)
                    word_count[word] += c
    if count >= 10:
        print '  SPAM'*4
        for word in found:
            print '%10s%3s%3s' % (word, word_points[word], word_count[word])
    else:
        print "Not spam"
find_junk('spam.txt')

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/15202457

复制

相似问题

问程序设计概念
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问程序设计概念EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问程序设计概念
EN