我还有另一个新手蟒蛇的问题。我有一个文件如下所示。我需要把它转换成矢量和指纹一样的表格。对我来说,问题是如何组合文件,所以最终我有一个矩阵,其中行是cmps,列是val.如果在comp中缺少val,那么等于零。cmp的流量是不同的,重叠也不是很大。你能建议哪里走得更好吗?Python字典?任何想法都有帮助。谢谢!
cmp1 0.277 val_1
cmp1 0.097 val_2
cmp1 0.795 val_3
cmp1 0.809 val_4
cmp1 0.127 val_5
cmp2 0.839 val_3
cmp2 0.909 val_4
cmp2 0.148 val_5
cmp2 0.938 val_6
cmp2 0.599 val_7结果我很难收到.
矢量版本
name val_1 val_2 val_3 val_4 val_5 val_6 val_7
cmp1 0.277 0.097 0.795 0.809 0.127 0 0
cmp2 0 0 0.839 0.909 0.148 0.938 0.599 二进制版本
name val_1 val_2 val_3 val_4 val_5 val_6 val_7
cmp1 0 0 1 1 0 0 0
cmp2 0 0 1 1 0 1 1当前代码
import csv
fi = open("data.txt", "rb")
fo = open("data_out.txt", "wb")
reader = csv.reader(fi,delimiter='\t')
writer = csv.writer(fo,delimiter='\t')
# making unique lists
targets = set()
ligands = set()
for row in reader:
ligands.add(row[0])
targets.add(row[2])
data = []
for row in reader:
if row[0] in ligands and row[2] in targets:
else: 发布于 2013-07-04 12:55:02
您可以在这里使用collections.defaultdict:
from collections import defaultdict
with open('abc') as f:
dic = defaultdict(dict)
for line in f:
cmp, val, col = line.split()
dic[cmp][col] = val
print dic
# defaultdict(<type 'dict'>,
#{'cmp1': {'val_5': '0.127', 'val_4': '0.809', 'val_1': '0.277', 'val_3': '0.795', 'val_2': '0.097'},
# 'cmp2': {'val_5': '0.148', 'val_4': '0.909', 'val_7': '0.599', 'val_6': '0.938', 'val_3': '0.839'}})
#get a sroted list of all val_i from the dic
vals = sorted(set(y for x in dic.itervalues() for y in x))
keys = sorted(dic)
print "name {}".format("\t".join(vals))
for key in keys:
print "{} {}".format(key, "\t".join(dic[key].get(v,'0') for v in vals) )输出:
name val_1 val_2 val_3 val_4 val_5 val_6 val_7
cmp1 0.277 0.097 0.795 0.809 0.127 0 0
cmp2 0 0 0.839 0.909 0.148 0.938 0.599对于二进制版本,您可以尝试:
print "name {}".format("\t".join(vals))
for key in keys:
strs = "\t".join(str(int(round(float(dic[key][v])))) if v in dic[key] else '0' for v in vals)
print "{} {}".format(key, strs )输出:
name val_1 val_2 val_3 val_4 val_5 val_6 val_7
cmp1 0 0 1 1 0 0 0
cmp2 0 0 1 1 0 1 1https://stackoverflow.com/questions/17470607
复制相似问题