我正在尝试生成一个列表中的列表。如果一个子列表元素更大,我将循环通过一个文件来更新列表。我写了下面的代码:
targets = open(file)
longest_UTR = []
for line in targets:
chromosome, locus, mir, gene, transcript, UTR_length = line.strip("\n").split("\t")
length_as_integer = int(UTR_length)
if not any(x[:3] == [locus, mir, gene] for x in longest_UTR):
longest_UTR.append([locus, mir, gene, transcript, length_as_integer])
elif length_as_integer > [int(x[4]) for x in longest_UTR]: ##x[4] = previous length_as_integer
longest_UTR.append([locus, mir, gene, transcript, length_as_integer])
print (longest_UTR)然而,我得到了这个错误:
elif len_as_int > (int(x[4]) for x in longest_UTR):
TypeError: '>' not supported between instances of 'int' and 'generator'如何将x[4]转换为整数以便与length_as_integer进行比较
谢谢
发布于 2018-11-01 22:44:54
如果我没弄错的话,试着用下面的代码替换elif行:
else:
longest_UTR = [[locus, mir, gene, transcript, length_as_integer] for x in longest_UTR if x[:3] == [locus, mir, gene] and length_as_integer > int(x[4]) else x]:您遍历所有列表,更新匹配条件的列表,如果不匹配,则什么也不做。
发布于 2018-11-01 22:44:59
因此,关于您的需求有一些来回,但我最终的理解是:您正在遍历一个数据集。此数据集中的每个target都有一个locus、mri和gene以及一个UTR_length属性。对于locus、mri和gene的每一种独特组合,您正在尝试查找具有最大UTR_Length的所有targets吗?
假设您想要在数据集中找到最大值,有两种方法。
1)您可以简单地将您的输入文件转换为pandas数据帧,按您的locus、mri和gene值进行分组,并返回所有带有max(UTR_Length)的值。从易于实现的角度来看,这可能是您最好的选择。然而,pandas并不总是正确的工具,它带来了大量的开销,特别是如果你想将你的项目Docker化。
2)如果你想使用基本的python包,我建议使用集合和字典:
targets = open(file)
list_of_targets = []
for line in targets:
chromosome, locus, mir, gene, transcript, UTR_length = line.strip("\n").split("\t")
length_as_integer = int(UTR_length)
list_of_targets.append((chromosome, locus, mir, gene, transcript, UTR_length))
# Generate Set of unqiue locus, mri, gene (lmg) combinations
set_of_locus_mri_gene = {(i[1], i[2], i[3]) for i in list_of_targets}
# Generate dictionary of maximum lengths for each distinct lmg combo
dict_of_max_lengths = {lmg: max([targets[5] for targets in list_of_targets if
(targets[1], targets[2], targets[3]) == lmg]) for
lmg in set_of_locus_mri_gene}
# Generate dictionary with lmg keys and all targets with corresponding max length
final_output = {lmg: [target for target in list_of_targets if target[5] == max_length] for
lmg, max_length in dict_of_max_lengths.items()}发布于 2018-11-01 23:17:48
由于您希望替换longest_UTR变量并保持事物的良好名称,因此可以使用字典而不是列表:
targets = open(file)
longest_UTR = {}
for line in targets:
chromosome, locus, mir, gene, transcript, UTR_length = line.strip("\n").split("\t")
length_as_integer = int(UTR_length)
# Your condition works for initializing the dictionary because of the default value.
if length_as_integer > longest_UTR.get("Length", -1):
longest_UTR["Chromosome"] = chromosome
longest_UTR["Locus"] = locus
longest_UTR["Mir"] = mir
longest_UTR["Gene"] = gene
longest_UTR["Transcript"] = transcript
longest_UTR["Length"] = length_as_integer
print (longest_UTR)编辑:这里也是使用列表的代码版本,以防您有兴趣了解其中的差异。就我个人而言,我觉得这本字典比较好读。
targets = open(file)
longest_UTR = [None, None, None, None, None, -1]
for line in targets:
chromosome, locus, mir, gene, transcript, UTR_length = line.strip("\n").split("\t")
length_as_integer = int(UTR_length)
# Your condition works for initializing the list because of the default value.
if length_as_integer > longest_UTR[5]:
longest_UTR[0] = chromosome
longest_UTR[1] = locus
longest_UTR[2] = mir
longest_UTR[3] = gene
longest_UTR[4] = transcript
longest_UTR[5] = length_as_integer
print (longest_UTR)https://stackoverflow.com/questions/53103311
复制相似问题