考虑文件testbam.txt
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bam和文件testbai.txt
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bai它们总是有相同的长度,我创建了一个函数来找到它:
def file_len(fname):
with open(fname) as f:
for i,l in enumerate(f):
pass
return i+1
n = file_len('/groups/cgsd/alexandre/python_code/src/testbai.txt')
print(n)
3然后,我通过打开文件并执行一些操作创建了两个列表:
content = []
with open('/groups/cgsd/alexandre/python_code/src/testbam.txt') as bams:
for line in bams:
content.append(line.strip().split())
print(content)
content2 = []
with open('/groups/cgsd/alexandre/python_code/src/testbai.txt') as bais:
for line in bais:
content2.append(line.strip().split())
print(content2)现在我有一个名为mutec.json的json类型文件,我想用列表中的项目替换它的某些部分:
{
"Mutect2.gatk_docker": "broadinstitute/gatk:4.1.4.1",
"Mutect2.intervals": "/groups/cgsd/alexandre/gatk-workflows/src/interval_list/Basic_Core_xGen_MSI_TERT_HPV_EBV_hg38.interval_list",
"Mutect2.scatter_count": 30,
"Mutect2.m2_extra_args": "--downsampling-stride 20 --max-reads-per-alignment-start 6 --max-suspicious-reads-per-alignment-start 6",
"Mutect2.filter_funcotations": true,
"Mutect2.funco_reference_version": "hg38",
"Mutect2.run_funcotator": true,
"Mutect2.make_bamout": true,
"Mutect2.funco_data_sources_tar_gz": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/funcotator_dataSources.v1.6.20190124s.tar.gz",
"Mutect2.funco_transcript_selection_list": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/transcriptList.exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt",
"Mutect2.ref_fasta": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta",
"Mutect2.ref_fai": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta.fai",
"Mutect2.ref_dict": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.dict",
"Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
"Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",
}请注意,本节内容如下:
"Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
"Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",<<<N_item_of_list_content>>>和<<<N_item_of_list_content2>>>应该被它们各自的列表项目替换,我想最后将每次修改的结果写入到一个新文件中。
最终结果将是3个文件:mutect1.json,第一个项目来自testbam.txt,第一个项目来自testbai.txt,mutect2.json,第二个项目来自testbam.txt,第二个项目来自testbai.txt,第三个文件应用了相同的推理。
请注意,我写的<<<N_item_of_list_content>>>和<<<N_item_of_list_content2>>>符号并不一定是硬编码到文件中的,我写自己只是为了清楚我想要替换什么。
发布于 2021-03-08 19:32:30
首先,即使它与问题无关,您的一些代码也不是真正的Pythonic:
def file_len(fname):
with open(fname) as f:
for i,l in enumerate(f):
pass
return i+1在应该简单地执行以下操作时,可以在enumerate上使用for循环:
def file_len(fname):
with open(fname) as f:
return len(f)因为f是文件各行的迭代器
现在来回答你的问题。您希望用在另外两个文件中找到的数据替换文件中的某些元素。
在您最初的问题中,字符串括在三个尖括号中。
我会用到:
import re
rx = re.compile(r'<<<.*?>>>') # how to identify what is to replace
with open('.../testbam.txt') as bams, open('.../testbai.txt') as bais, \
open('.../mutect.json') as src:
for i, reps in enumerate(zip(bams, bais), 1): # gets a pair of replacement strings at each step
src.seek(0) # rewind src file
with open(f'mutect{i}', 'w') as fdout: # open the output files
rep_index = 0 # will first use rep string from first file
for line in src:
if rx.search(line): # if the string to replace there?
line = rx.sub(reps[rep_index], line)
rep_index = 1 - rep_index # next time will use the other string
fdout.write(line)在注释中,您建议将每个文件的第一行与其他行一起更改。代码可能会变成:
with open('.../testbam.txt') as bams, open('.../testbai.txt') as bais, \
open('.../mutect.json') as src:
it = iter(zip(bams, bais))
to_find = next(it) # we will have to find that
for i, reps in enumerate(it, 2): # gets a pair of replacement strings at each step
src.seek(0) # rewind src file
with open(f'mutect{i}', 'w') as fdout: # open the output files
for line in src:
line = line.replace(to_find[0], reps[0]) # just try to replace
line = line.replace(to_find[1], reps[1])
fdout.write(line)https://stackoverflow.com/questions/66528299
复制相似问题