我有一个唯一字符串的列表(“示例ID”)。我还有一个表,其中包含第一个列表中的字符串子集,每个字符串子集都与下一列(以空格为分隔符)中的另一个字符串(“示例特征”)相关联。例如:
# All Sample IDs
id-001
id-002
id-003
id-004
id-005# Subset of Samples, with associated characteristics string
id-001 'batch-1, yellow'
id-003 'batch-1, yellow'
id-005 'batch-9, blue'
# Desired Output
id-001 'batch-1, yellow'
id-002 NA
id-003 'batch-1, yellow'
id-004 NA
id-005 'batch-9, blue'我正在尝试组合这两个列表,创建一个表,其中第一列将包含所有“样本ID”,第二列将包含每个ID的相应“样本特征”字符串,如果ID不在第二个列表中,则为“NA”。
我一直在使用此代码比较两个ID列表,以找出哪些示例ID将具有可用的“sample characteristics”字符串:
with open('FILE1.txt', 'r') as file1:
with open('FILE2.txt', 'r') as file2:
same = set(file1).intersection(file2)
with open('RESULT.txt', 'w') as file_out:
for line in same:
file_out.write(line)我还没有弄清楚如何获得这些ID的“样本特征”,并将它们与第一个列表结合起来。我认为使用字典应该是第一步:
with open('FILE1.txt', 'r') as file1, open('FILE2.txt', 'r') as file2:
data1 = file1
data2 = dict(file2)我不知道如何继续下去。
发布于 2017-07-15 03:54:33
我想你要找的东西是这样的:
import csv
results = {}
with open('FILE1.txt') as file1:
for id_num in file1:
results[id_num.strip()] = None
with open('FILE2.txt') as file2:
csv_reader = csv.reader(file2, delimiter=' ')
for row in csv_reader:
id_num, characteristic = row
results[id_num] = characteristic
with open('RESULT.txt', 'w') as file_out:
csv_writer = csv.writer(file_out, delimiter=' ')
for id_num, characteristic in results.items():
if characteristic is None:
characteristic = 'NA'
row = [id_num, characteristic]
csv_writer.writerow(row)这基本上建立了一个字典,将第一个文件中的所有id作为字典的键。
然后遍历第二个文件的每一行,为出现的每个id更新字典。
然后,它将更新后的字典写入新的csv文件。
https://stackoverflow.com/questions/45110431
复制相似问题