使snakemake用作输入,2输出具有不同名称的规则
我正在制作一条蛇形管道,其中我使用strelka来比较肿瘤和正常样本。在这种情况下,我想比较GERMLINE = ("PT1", "S6", "S1”)的第一个元素和肿瘤TUMOR = ("T5", "T7", "T20")的第一个元素。
该管道适用于初始规则:文件夹、strelkaconfig和strelkarun。问题在于strelka输出的后处理,因为我想对这两个输出进行相同的处理:
然而,我不知道如何让snakemake明白,在不重复规则的情况下,它应该对两者都做同样的事情。我试着做以下几件事:
GERMLINE = ("PT1", "S6", "S1")
TUMOR = ("T5", "T7", "T20")
ANALYSIS = "OUTPUT_PATH"
TYPEVAR = ["snvs","indels"]
INDGATK = "ref"
rule all:
input:
[ANALYSIS +"/{}_vs_{}/Stelka/results/variants/somatic.snvs.vcf".format(sample_g, sample_t)
for (sample_g, sample_t) in zip(GERMLINE, TUMOR)],
[ANALYSIS +"/{}_vs_{}/Stelka/runWorkflow.py".format(sample_g, sample_t)
for (sample_g, sample_t) in zip(GERMLINE, TUMOR)],
[ANALYSIS +"/{}_vs_{}/Stelka/results/variants/somatic.{}_Filtered".format(sample_g, sample_t,typevar)
for (sample_g, sample_t,typevar)
in zip(GERMLINE*len(TUMOR), TUMOR*len(TUMOR),sorted(TYPEVAR*len(TUMOR)))]
# Make folders
rule folders:
input:
g = "{samples_g}.bam",
t = "{samples_t}.bam"
output:
gen = "/{samples_g}_vs_{samples_t}",
strelka = "/{samples_g}_vs_{samples_t}/Stelka/"
run:
'''mkdir {output.gen}
mkdir {output.strelka}'''
# Strelka configuration
rule strelkaconfig:
input:
g = "{samples_g}.bam",
t = "{samples_t}.bam",
out_dir = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/"
output:
wfs = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/runWorkflow.py"
params:
ref = INDGATK
shell:
"python configureStrelkaSomaticWorkflow.py --normalBam {input.g} --tumorBam {input.t} --referenceFasta {params.ref} --runDir {input.out_dir} "
# Strelka run
rule strelkarun:
input:
wfs = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/runWorkflow.py"
output:
outsnvs = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/results/variants/somatic.snvs.vcf",
outindels = ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/results/variants/somatic.indels.vcf"
shell:
"python {input.wfs}"
# POSTPROCESSING
rule vcfp:
input: ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/results/variants/somatic.{typevar}.vcf"
output: ANALYSIS + "/{samples_g}_vs_{samples_t}/Stelka/results/variants/somatic.{typevar}_Filtered.vcf"
shell:
"java -jar StrelkaVCFParser -v {input} "但是当我尝试运行时,我会得到这样的错误:
MissingInputException in line 15 of pipe:
Missing input files for rule folders:
T7/Stelka/results/variants/somatic.indels_Filtered.bam发布于 2018-02-19 10:39:19
这条管道似乎在推断通配符不是你所期望的那样。
您可以尝试使用通配符约束,如下所示:
wildcard_constraints:
samples_g = "|".join(GERMLINE)
samples_t = "|".join(TUMOR)这可能解决不了您的问题,但是all规则的第三个输入在我看来并不是很清楚。您可以使用两个连续的expand (第一个使用zip )实现相同的目标,如下所示:
expand(expand(
ANALYSIS + "/{sample_g}_vs_{sample_t}/Stelka/results/variants/somatic.{{typevar}}_Filtered",
zip, sample_g=GERMLINE, sample_t=TUMOR), typevar=TYPEVAR)注意typevar周围的双大括号,这样它就不会在第一个展开时展开。
在执行python3之后,您可以在from snakemake.io import expand解释器中测试这一点。
我个人认为如果更容易理解。
https://stackoverflow.com/questions/48784360
复制相似问题