我正在尝试使用一段python从csv中提取带标签的单词。但是,我一直遇到以下编码问题。我看过类似的问题,但我的python技能非常基础。有没有人可以帮助我,让我知道我应该在代码中修改什么?
我得到的错误是:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-8: ordinal not in range(128)这是我的代码:
from bs4 import BeautifulSoup
import csv
input_name = "example.csv" # File names for input and output
output_name = "entities.csv"
def incrementEntity(entity_string, dictionary):
try:
dictionary[entity_string] += 1
except KeyError:
dictionary[entity_string] = 1
def outputResults(dictionary, entity_type, f):
for i in sorted(dictionary, key=dictionary.get, reverse=True):
print i, '\t', entity_type, '\t', dictionary[i]
f.writerow([i, entity_type, dictionary[i]])
try:
f = open(input_name, 'r')
soup = BeautifulSoup(f)
f.close()
except IOError, message:
print message
raise ValueError("Input file could not be opened")
locations = {}
people = {}
orgs = {}
for i in soup.find_all():
entity_name = i.get_text()
entity_type = i.name
if (entity_type == 'person'):
incrementEntity(entity_name, people)
elif (entity_type == 'organization'):
incrementEntity(entity_name, orgs)
elif (entity_type == 'location'):
incrementEntity(entity_name, locations)
else:
continue
output_file = open(output_name, 'w')
f = csv.writer(output_file)
print "Entity\t\tType\t\tCount"
print "------\t\t----\t\t-----"
f.writerow(["Entity", "Type", "Count"])
outputResults(people, 'person', f)
outputResults(orgs, 'organization', f)
outputResults(locations, 'location', f)
output_file.close()发布于 2015-06-03 06:28:23
我不确定为什么要由BeautifulSoup解析的输入文件是csv文件。我猜那是个打字错误。
Python的CSV库不支持Unicode,因此在尝试写入非ASCII字符时会失败。
您有三个选项:
outputResults中:f.writerow([i.encode("utf-8"),entity_type.encode("utf-8"),使用损坏的CSV库创建一个UnicodeWriter,如下所示:https://docs.python.org/2/library/csv.html#examples。这将允许您透明地编写Unicode并为您完成编码。
https://stackoverflow.com/questions/30579031
复制相似问题