我有一份这样的文件:
732772 scaffold-3 G G A
732772 scaffold-2 G G A
742825 scaffold-3 A A G
776546 scaffold-3 G A G
776546 scaffold-6 G A G我感兴趣的是使用列2作为我的键,并以这样一种方式输出:拥有一个唯一的键,并使用它来辅助值。
换句话说,如果列2中的名称出现不止一次,则只输出一次,因此输出应该是:
scaffold-3
732772 G G A
742825 A A G
776546 G A G
scaffold-2
732772 G G A
scaffold-6
776546 G A G我写了这样的东西:
res = open('00test','r')
out = open('00testresult','w')
d = {}
for line in res:
if not line.startswith('#'):
line = line.strip().split()
pos = line[0]
name = line[1]
call = line[2]
father = line[3]
mother = line[4]
if not (name in d):
d[name] = []
d[name].append({'pos':pos,'call':call,'father':father,'mother':mother})但我不知道如何以我前面描述的方式输出它。
任何帮助都会很好
编辑:
这是完全工作的代码,解决了这个问题:
res = open('00test','r')
out = open('00testresult','w')
d = {}
for line in res:
if not line.startswith('#'):
line = line.strip().split()
pos = line[0]
name = line[1]
call = line[2]
father = line[3]
mother = line[4]
if not (name in d):
d[name] = []
d[name].append({'pos':pos,'call':call,'father':father,'mother':mother})
for k,v in d.items():
out.write(str(k)+'\n')
for i in v:
out.write(str(i['pos'])+'\t'+str(i['call'])+'\t'+str(i['father'])+'\t'+str(i['mother'])+'\n')
out.close()发布于 2013-08-14 14:17:53
现在您已经拥有了字典,循环这些项并将其写入文件:
keys = ('pos', 'call', 'father', 'mother')
with open(outputfilename, 'w') as output:
for name in d:
output.write(name + '\n')
for entry in d['name']:
output.write(' '.join([entry[k] for k in keys]) + '\n')您可能希望使用collections.defaultdict()对象,而不是用于d的常规字典。
from collections import defaultdict
d = defaultdict(list)并完全删除if not (name in d): d[name] = []行。
https://stackoverflow.com/questions/18234198
复制相似问题