我是个超级蟒蛇新手。
我正在尝试确定姓名列表的变音位代码。稍后将对这些代码进行比较,以找到可能听起来相似的名称。
jellyfish模块适合我的需要,我可以在创建列表时获得metaphone代码,如下所示:
import jellyfish
names = ['alexander','algoma','angel','antler']
for i in names:
print(i, "metaphone value =", jellyfish.metaphone(i))
##OUTPUT:
alexander metaphone value = ALKSNTR
algoma metaphone value = ALKM
angel metaphone value = ANJL
antler metaphone value = ANTLR但是,我需要获得大约3000个姓名列表的变音位代码。我创建了一个包含所需列标题和现有名称列表的.csv。它看起来是这样的:
RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,,
1240,ABBEY,ABBEY,,
2133,ACES,ACES,,
362,ADAMS,ADAMS,,因此,理想情况下,对于每行的FirstWord列中的单词,我需要FirstWordMeta =变音位代码,对于每行的ST_NAME列中的单词,我需要StMeta =变音位代码。我希望输出的.csv看起来像这样:
RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,A,A F JNSN
1240,ABBEY,ABBEY,SS,AB
2133,ACES,ACES,SS,SS
362,ADAMS,ADAMS,ATMS,ATMS我尝试过csv模块,但我不明白在使用jellyfish.metaphone()时如何合并引用特定列。
发布于 2019-08-06 05:33:49
您可以使用pandas模块:
import pandas as pd
import jellyfish
data = pd.read_csv("test.csv") # Your filename here
# Looping over the rows and calculating the metaphone
for i in range(data.shape[0]):
data["FirstWordMeta"][i] = jellyfish.metaphone(data["FirstWord"][i])
data["StMeta"][i] = jellyfish.metaphone(data["ST_NAME"][i])
# Save to csv
data.to_csv("result.csv")发布于 2019-08-06 05:36:27
您可以尝试这样做:
import csv
import jellyfish
with open('input.csv') as inputfile:
reader = csv.reader(inputfile)
headers = next(reader)
inputdata = list(reader)
with open('output.csv', 'w') as outputfile:
writer = csv.writer(outputfile)
writer.writerow(headers)
for row in inputdata:
outputrow = row[:3] + [
jellyfish.metaphone(row[2]),
jellyfish.metaphone(row[1])
]
writer.writerow(outputrow)https://stackoverflow.com/questions/57366144
复制相似问题