文章/答案/技术大牛

发布

社区首页 >问答首页 >在Python中创建与两个列表中具有相同项目数的项目数相同数量的文件

问在Python中创建与两个列表中具有相同项目数的项目数相同数量的文件
EN

Stack Overflow用户

提问于 2021-03-08 18:41:22

回答 1查看 27关注 0票数 0

考虑文件testbam.txt

/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bam

和文件testbai.txt

/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bai

它们总是有相同的长度，我创建了一个函数来找到它：

def file_len(fname):
    with open(fname) as f:
        for i,l in enumerate(f):
            pass
        return i+1

n = file_len('/groups/cgsd/alexandre/python_code/src/testbai.txt')
print(n)
3

然后，我通过打开文件并执行一些操作创建了两个列表：

content = []
with open('/groups/cgsd/alexandre/python_code/src/testbam.txt') as bams:
    for line in bams:
        content.append(line.strip().split())

print(content)

content2 = []
with open('/groups/cgsd/alexandre/python_code/src/testbai.txt') as bais:
    for line in bais:
        content2.append(line.strip().split())

print(content2)

现在我有一个名为mutec.json的json类型文件，我想用列表中的项目替换它的某些部分：

{
    "Mutect2.gatk_docker": "broadinstitute/gatk:4.1.4.1",
    "Mutect2.intervals": "/groups/cgsd/alexandre/gatk-workflows/src/interval_list/Basic_Core_xGen_MSI_TERT_HPV_EBV_hg38.interval_list",
    "Mutect2.scatter_count": 30,
    "Mutect2.m2_extra_args": "--downsampling-stride 20 --max-reads-per-alignment-start 6 --max-suspicious-reads-per-alignment-start 6",
    "Mutect2.filter_funcotations": true,
    "Mutect2.funco_reference_version": "hg38",
    "Mutect2.run_funcotator": true,
    "Mutect2.make_bamout": true,
    "Mutect2.funco_data_sources_tar_gz": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/funcotator_dataSources.v1.6.20190124s.tar.gz",
    "Mutect2.funco_transcript_selection_list": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/transcriptList.exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt",
  
    "Mutect2.ref_fasta": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta",
    "Mutect2.ref_fai": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta.fai",
    "Mutect2.ref_dict": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.dict",
    
    "Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
    "Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",
  }

请注意，本节内容如下：

   "Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
   "Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",

<<<N_item_of_list_content>>>和<<<N_item_of_list_content2>>>应该被它们各自的列表项目替换，我想最后将每次修改的结果写入到一个新文件中。

最终结果将是3个文件：mutect1.json，第一个项目来自testbam.txt，第一个项目来自testbai.txt，mutect2.json，第二个项目来自testbam.txt，第二个项目来自testbai.txt，第三个文件应用了相同的推理。

请注意，我写的<<<N_item_of_list_content>>>和<<<N_item_of_list_content2>>>符号并不一定是硬编码到文件中的，我写自己只是为了清楚我想要替换什么。

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-03-08 19:32:30

首先，即使它与问题无关，您的一些代码也不是真正的Pythonic：

def file_len(fname):
    with open(fname) as f:
        for i,l in enumerate(f):
            pass
        return i+1

在应该简单地执行以下操作时，可以在enumerate上使用for循环：

def file_len(fname):
    with open(fname) as f:
        return len(f)

因为f是文件各行的迭代器

现在来回答你的问题。您希望用在另外两个文件中找到的数据替换文件中的某些元素。

在您最初的问题中，字符串括在三个尖括号中。

我会用到：

import re

rx = re.compile(r'<<<.*?>>>')        # how to identify what is to replace

with open('.../testbam.txt') as bams, open('.../testbai.txt') as bais, \
     open('.../mutect.json') as src:
    for i, reps in enumerate(zip(bams, bais), 1): # gets a pair of replacement strings at each step
        src.seek(0)                  # rewind src file
        with open(f'mutect{i}', 'w') as fdout:  # open the output files
            rep_index = 0            # will first use rep string from first file
            for line in src:
                if rx.search(line):  # if the string to replace there?
                    line = rx.sub(reps[rep_index], line)
                    rep_index = 1 - rep_index    # next time will use the other string
                fdout.write(line)

在注释中，您建议将每个文件的第一行与其他行一起更改。代码可能会变成：

with open('.../testbam.txt') as bams, open('.../testbai.txt') as bais, \
     open('.../mutect.json') as src:
    it = iter(zip(bams, bais))
    to_find = next(it)          # we will have to find that
    for i, reps in enumerate(it, 2): # gets a pair of replacement strings at each step
        src.seek(0)                  # rewind src file
        with open(f'mutect{i}', 'w') as fdout:  # open the output files
            for line in src:
                line = line.replace(to_find[0], reps[0])    # just try to replace
                line = line.replace(to_find[1], reps[1])
                fdout.write(line)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66528299

复制

相似问题

问在Python中创建与两个列表中具有相同项目数的项目数相同数量的文件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中创建与两个列表中具有相同项目数的项目数相同数量的文件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中创建与两个列表中具有相同项目数的项目数相同数量的文件
EN