首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用snakemake合并多个vcf文件

使用snakemake合并多个vcf文件
EN

Stack Overflow用户
提问于 2020-05-29 22:05:23
回答 1查看 288关注 0票数 0

我正在尝试使用snakemake按染色体合并几个vcf文件。我的文件是这样的,正如你所看到的,有不同的坐标。合并所有chr1A和所有chr1B的最佳方法是什么?

代码语言:javascript
复制
chr1A:0-2096.filtered.vcf
chr1A:2096-7896.filtered.vcf
chr1B:0-3456.filtered.vcf
chr1B:3456-8796.filtered.vcf

我的伪代码:

代码语言:javascript
复制
chromosomes=["chr1A","chr1B"]

rule all:
    input: 
        expand("{sample}.vcf", sample=chromosomes)

rule merge:
    input:
        I1="path/to/file/{sample}.xxx.filtered.vcf",
        I2="path/to/file/{sample}.xxx.filtered.vcf",
    output:
        outf ="{sample}.vcf"
    shell:
        """
        java -jar picard.jar GatherVcfs I={input.I1} I={input.I2} O={output.outf}
        """

编辑:

代码语言:javascript
复制
workdir: "/media/prova/Maxtor2/vcf2/merged/"

import subprocess

d = {"chr1A": ["chr1A:0-2096.flanking.view.filtered.vcf", "chr1A:2096-7896.flanking.view.filtered.vcf"],
     "chr1B": ["chr1B:0-3456.flanking.view.filtered.vcf", "chr1B:3456-8796.flanking.view.filtered.vcf"]}


rule all:
    input:
        expand("{sample}.vcf", sample=d)

def f(w):
    return d.get(w.chromosome, "")


rule merge:
    input:
        f
    output:
        outf ="{chromosome}.vcf"
    params:
        lambda w: "I=" + " I=".join(d[w.chromosome])
    shell:
        "java -jar /home/Documents/Tools/picard.jar GatherVcfs {params[0]} O={output.outf}"
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-29 22:51:05

我能重现你的窃听器。当约束通配符时,它起作用:

代码语言:javascript
复制
d = {"chr1A": ["chr1A:0-2096.flanking.view.filtered.vcf", "chr1A:2096-7896.flanking.view.filtered.vcf"],
     "chr1B": ["chr1B:0-3456.flanking.view.filtered.vcf", "chr1B:3456-8796.flanking.view.filtered.vcf"]}

chromosomes = list(d)

rule all:
    input:
        expand("{sample}.vcf", sample=chromosomes)

# these tell Snakemake exactly what values the wildcards may take
# we use "|" to create the regex chr1A|chr1B
wildcard_constraints:
    chromosome = "|".join(chromosomes)

rule merge:
    input:
        # a lambda is an unnamed function
        # the first argument is the wildcards
        # we merely use it to look up the appropriate files in the dict d
        lambda w: d[w.chromosome]
    output:
        outf = "{chromosome}.vcf"
    params:
        # here we create the string 
        # "I=chr1A:0-2096.flanking.view.filtered.vcf I=chr1A:2096-7896.flanking.view.filtered.vcf"
        # for use in our command
        lambda w: "I=" + " I=".join(d[w.chromosome])
    shell:
        "java -jar /home/Documents/Tools/picard.jar GatherVcfs {params[0]} O={output.outf}"

它也应该在没有约束的情况下工作;这看起来像是Snakemake中的一个bug。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62087804

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档