我的脚本是这样的:
import csv
with open('lees.csv','rU') as naver:
reader = csv.DictReader (naver)
for alist in reader:
name = alist["naam"]
polisnumber = alist["polisnr"]
riskadr = alist["risico adr"]
insurencecode = alist["branchecode"]
relationnumber = alist["rel"]
header = alist["aanhef"]
tav = alist["tav"]
thelist = [name,riskadr,polisnumber,
relationnumber,insurencecode,header,tav]该脚本的输出为:
['Cautus B.V.', 'plein 92', '1129008', '10', 'AVB', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa']
['Cautus B.V.', 'Wei 9-11', '1019123', '10', 'AVB', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa']
['Cautus B.V.', 'plein 92', '1129008', '10', 'BEDR', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa']
['Cautus B.V.', 'Wei 9-11', '1019123', '10', 'BEDR', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa']
['De company', 'tiellaan 42', 'KD0022232', '13', 'AVB', 'Geachte heer Tigch', 'De heer I. Tigch']
['De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch']
['Slever ', 'klopt 42', 'KD2220115', '17', 'AVB', 'Geachte heer Slever', 'De heer T.Slever']如您所见,我从一个.csv文件创建了一个目录。
我的问题是,我需要创建一个脚本来过滤riskadr (wei 9-11 / plein 92 / tiellaan 42)中的重复项,并添加insurencecode (AVB/BEDR/DAS,等。将第二个重复的riskadr添加到新列表中的第一个条目以及其他条目的条目中。
因此,现在我们有两个条目,具有相同的风险adr,如下所示:
['De company', 'tiellaan 42', 'KD0022232', '13', 'AVB', 'Geachte heer Tigch', 'De heer I. Tigch']
['De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch']但我想要一个scipt,它从这2个条目中生成1个条目,并将保险类型添加到第一个条目中,就像这样(AVB/DAS):
['De company', 'tiellaan 42', 'KD0022232', '13', 'AVB','DAS', 'Geachte heer Tigch', 'De heer I. Tigch']发布于 2012-10-17 18:05:29
您应该能够使用itertools.groupby实现您的目标
from itertools import groupby
# define input
l = [['Cautus B.V.', 'plein 92', '1129008', '10', 'AVB', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa'],
['Cautus B.V.', 'Wei 9-11', '1019123', '10', 'AVB', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa'],
['Cautus B.V.', 'plein 92', '1129008', '10', 'BEDR', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa'],
['Cautus B.V.', 'Wei 9-11', '1019123', '10', 'BEDR', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa'],
['De company', 'tiellaan 42', 'KD0022232', '13', 'AVB', 'Geachte heer Tigch', 'De heer I. Tigch'],
['De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch'],
['Slever ', 'klopt 42', 'KD2220115', '17', 'AVB', 'Geachte heer Slever', 'De heer T.Slever']]
# remove clutter
l_clean = [(x[1], x[4]) for x in l]
# sort (groupby requires input to be sorted)
l_sorted = sorted(l_clean)
# group by first column
l_final = [(k, zip(*v)[1]) for k,v in groupby(l_sorted, key=lambda x:x[0])]
# print output
for k,v in l_final:
print k, list(v)输出为:
Wei 9-11 ['AVB', 'BEDR']
klopt 42 ['AVB']
plein 92 ['AVB', 'BEDR']
tiellaan 42 ['AVB', 'DAS']请注意,您需要调整用于排序和分组的key函数,以按照预期使用与l_clean不同的输入。
发布于 2012-10-17 17:47:59
>>> a = [
... ('De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch'),
... ('De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch'),
... ]
>>>
>>> set(a)
set([('De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch')])
>>>将它们保存为元组而不是列表,并将它们添加到一个集合中...如果这就是你所需要的
发布于 2012-10-17 18:02:45
沿着这些思路,您可能需要一些东西。有一个内存中的数组(ultimatelist),您可以在其中检查是否存在类似的thelist。如果找到,请附加保险代码
def search(item, array):
for i in range(len(array)):
# if first four elements and last two elements are identical
if array[i][:4] == item[0:4] and array[i][-2:] == item[-2:]:
return i
return -1
index = search(thelist, ultimatelist):
if index > 0:
ultimatelist[index] = ultimatelist[index][:4] + thelist[4] + ultimatelist[index][4:]https://stackoverflow.com/questions/12931277
复制相似问题