我有一个txt文件。我已经编写了代码,可以找到唯一的单词和每个单词在该文件中出现的次数。我现在需要弄清楚如何打印这些单词的线条。我该怎么做呢?
这里是一个示例输出:分析什么文件: itsy_bitsy_spider.txt
文件itsy_bitsy_spider.txt itsy的一致性:总数:2行:1: ITSY Bitsy蜘蛛爬上水龙:4: ITSY Bitsy蜘蛛又爬上了喷口
#this function will get just the unique words without the stop words.
def openFiles(openFile):
for i in openFile:
i = i.strip()
linelist.append(i)
b = i.lower()
thislist = b.split()
for a in thislist:
if a in stopwords:
continue
else:
wordlist.append(a)
#print wordlist
#this dictionary is used to count the number of times each stop
countdict = {}
def countWords(this_list):
for word in this_list:
depunct = word.strip(punctuation)
if depunct in countdict:
countdict[depunct] += 1
else:
countdict[depunct] = 1发布于 2011-11-01 04:01:16
from collections import defaultdict
target = 'itsy'
word_summary = defaultdict(list)
with open('itsy.txt', 'r') as f:
lines = f.readlines()
for idx, line in enumerate(lines):
words = [w.strip().lower() for w in line.split()]
for word in words:
word_summary[word].append(idx)
unique_words = len(word_summary.keys())
target_occurence = len(word_summary[target])
line_nums = set(word_summary[target])
print "There are %s unique words." % unique_words
print "There are %s occurences of '%s'" % (target_occurence, target)
print "'%s' is found on lines %s" % (target, ', '.join([str(i+1) for i in line_nums]))发布于 2011-11-01 02:48:55
如果逐行解析输入文本文件,则可以维护另一个字典,即word ->列表映射。对于一行中的每个单词,您都添加了一个条目。看起来可能如下所示。记住,我对python不是很熟悉,所以我可能错过了一些语法快捷键。
例如
countdict = {}
linedict = {}
for line in text_file:
for word in line:
depunct = word.strip(punctuation)
if depunct in countdict:
countdict[depunct] += 1
else:
countdict[depunct] = 1
# add entry for word in the line dict if not there already
if depunct not in linedict:
linedict[depunct] = []
# now add the word -> line entry
linedict[depunct].append(line)您可能需要做的一项修改是,如果一行中出现了两次单词,则避免将重复项添加到行中。
以上代码假设您只想读取文本文件一次。
发布于 2011-11-01 03:32:10
openFile = open("test.txt", "r")
words = {}
for line in openFile.readlines():
for word in line.strip().lower().split():
wordDict = words.setdefault(word, { 'count': 0, 'line': set() })
wordDict['count'] += 1
wordDict['line'].add(line)
openFile.close()
print wordshttps://stackoverflow.com/questions/7961859
复制相似问题