首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Snakemake dryrun模式下的NameError

Snakemake dryrun模式下的NameError
EN

Stack Overflow用户
提问于 2019-04-08 23:02:45
回答 2查看 594关注 0票数 0

我是Snakemake的新手,我正在尝试开发一些管道。我在使用通配符时遇到了一些问题,试图尽可能地自动化我的生物信息学分析。当管道变得更加复杂时,我遇到了麻烦(如下所示)。看起来Snakemake没有正确解析通配符。在Snakefile的预演期间,通配符的值在某些规则的执行中看起来是正确的。但是,相同的通配符会在管道的不同步骤(规则)中导致错误,我无法找出原因。下面我提供了演练的代码和输出消息。

代码语言:javascript
复制
num=["327905-LR-41624_normal","327907-LR-41624_tumor"]
num_normal=["327905-LR-41624"]
num_tumor=["327907-LR-41624"]

path="/path/to/Snakemake/"
genome="/path/to/references_genome/Mus_musculus.GRCm38.dna_rm.toplevel.fa"

rule all:
    input:  
    expand("/path/to/Snakemake/AS-{num_tum}_tumor_no_dupl_sort_RG_LB.bam",num_tum=num_tumor),
    expand("/path/to/Snakemake/AS-{num_norm}_normal_no_dupl_sort_RG_LB.bam",num_norm=num_normal)
ruleorder: samtools_sort > remove_duplicates >  samtools_index #>     add_readgroup_tumor > add_readgroup_normal

rule trim_galore:
    input:
        r1="/path/to/Snakemake/AS-{num}_R1.fastq",
        r2="/path/to/Snakemake/AS-{num}_R2.fastq"
    output:
        "/path/to/Snakemake/AS-{num }_R1_val_1.fq",
        "/path/to/Snakemake/AS-{num }_R2_val_2.fq"
    shell:
        "module load trim-galore/0.5.0 ; module load pypy/2.7-6.0.0 ; trim_galore  --output_dir /path/to/Snakemake/  --paired {input.r1} {input.r2}  "  

rule bwa_mem:
    input:
        R1="/path/to/Snakemake/AS-{num}_R1_val_1.fq",
        R2="/path/to/Snakemake/AS-{num}_R2_val_2.fq"
    output:
        "/path/to/Snakemake/AS-{num}.bam"
    shell:
        "module load samtools/default ; module load bwa/0.7.8 ; bwa mem  {genome}  {input.R1} {input.R2} | samtools view -h -b  > {output} "

rule samtools_sort:
    input:
        "/path/to/Snakemake/AS-{num}.bam"
    output:
        "/path/to/Snakemake/AS-{num}_sort.bam"
    shell:
        "module load samtools/default ; samtools sort -n  -O BAM {input} > {output} "

rule remove_duplicates:
    input:
        "/path/to/Snakemake/AS-{num}_sort.bam"
    output:
        outbam="/path/to/Snakemake/AS-{num}_no_dupl_sort.bam",
        metrics="/path/to/Snakemake/AS-{num}_dupl_metrics.txt"
    shell:
        "module load gatk/4.0.9.0 ; gatk MarkDuplicates -I {input}  -O {output.outbam} -M {output.metrics}  --REMOVE_DUPLICATES=true "

rule samtools_index:
    input:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam.bai"
    shell:
        "module load samtools/default ; samtools index  {input} "

rule add_readgroup_normal:
    input:
    "/path/to/Snakemake/AS-{num_normal}_normal_no_dupl_sort.bam"
output:
    "/path/to/Snakemake/AS-{num_normal}_normal_no_dupl_sort_RG_LB.bam"
shell:
    "module load gatk/4.0.9.0 ;  gatk AddOrReplaceReadGroups   -PL Illumina -LB  { num_normal }   -PU  { num_normal }  -SM  NORMAL  -I  { input }    -O  {output} "

rule add_readgroup_tumor:
    input:
        "/path/to/Snakemake/AS-{num_tumor}_tumor_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{num_tumor}_tumor_no_dupl_sort_RG_LB.bam"
    shell:
        "module load gatk/4.0.9.0 ;  gatk AddOrReplaceReadGroups   -PL Illumina -LB  { num_tumor }   -PU  { num_tumor }  -SM  TUMOR     -I  { input }    -O  {output} "

当我使用以下命令测试Snakefile时:.local/bin/snakemake -s Snakefile_pipeline --dryrun

我得到了以下信息:

代码语言:javascript
复制
**Building DAG of jobs...**


**Job counts:**
    **count jobs
    1   add_readgroup_normal
    1   add_readgroup_tumor
    1   all
    2   bwa_mem
    2   remove_duplicates
    2   samtools_sort
    2   trim_galore
    11**

**[Mon Apr  8 16:14:27 2019]
rule trim_galore:
    input: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1.fastq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2.fastq
    output: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq
    jobid: 9
    wildcards: num=327907-LR-41624_tumor**


**[Mon Apr  8 16:14:27 2019]
rule trim_galore:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal_R1.fastq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2.fastq
    output: /path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq
    jobid: 10
    wildcards: num=327905-LR-41624_normal**


**[Mon Apr  8 16:14:27 2019]
rule bwa_mem:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq
    output: /path/to/Snakemake/AS-327905-LR-41624_normal.bam
    jobid: 8
    wildcards: num=327905-LR-41624_normal**


**[Mon Apr  8 16:14:27 2019]
rule bwa_mem:
    input: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq
    output: /path/to/Snakemake/AS-327907-LR-41624_tumor.bam
    jobid: 7
    wildcards: num=327907-LR-41624_tumor**


**[Mon Apr  8 16:14:27 2019]
rule samtools_sort:
    input: /path/to/Snakemake/AS-327907-LR-41624_tumor.bam
    output: /path/to/Snakemake/AS-327907-LR-41624_tumor_sort.bam
    jobid: 5
    wildcards: num=327907-LR-41624_tumor**


**[Mon Apr  8 16:14:27 2019]
rule samtools_sort:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal.bam
    output: /path/to/Snakemake/AS-327905-LR-41624_normal_sort.bam
    jobid: 6
    wildcards: num=327905-LR-41624_normal**


**[Mon Apr  8 16:14:27 2019]
rule remove_duplicates:
    input: /path/to/Snakemake/AS-327907-LR-41624_tumor_sort.bam
    output: /path/to/Snakemake/AS-327907-LR-41624_tumor_no_dupl_sort.bam, /path/to/Snakemake/AS-327907-LR-41624_tumor_dupl_metrics.txt
    jobid: 3
    wildcards: num=327907-LR-41624_tumor**


**[Mon Apr  8 16:14:27 2019]
rule remove_duplicates:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal_sort.bam
    output: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam, /path/to/Snakemake/AS-327905-LR-41624_normal_dupl_metrics.txt
    jobid: 4
    wildcards: num=327905-LR-41624_normal**


**[Mon Apr  8 16:14:27 2019]
rule add_readgroup_normal:
    input: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam
    output: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort_RG_LB.bam
    jobid: 2
    wildcards: num_normal=327905-LR-41624**

**RuleException in line 93 of /home/l136n/Snakefile_mapping_snv_call_pipeline2:
NameError: The name ' num_normal ' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}**

我已经在谷歌上搜索了这个错误,但几乎没有找到什么帮助。此外,我还仔细检查了管道是否存在不一致。我期望作为输出的内容在规则"all“中指明。规则"add_readgroup_normal“和"add_readgroup_tumor”应该采用输入文件的不同子集,这些子集是由前面的步骤生成的,它们在所有文件上运行。我想知道这个问题是不是因为分成两个子集而出现的。我再说一遍,我对Snakemake还很陌生,所以我可能错过了一些愚蠢的东西!任何帮助都会非常感谢,因为我完全被卡住了!提前谢谢你!

代码语言:javascript
复制
num=["327905-LR-41624_normal","327907-LR-41624_tumor"]
normal=["327905-LR-41624_normal"]
num_tumor=["327907-LR-41624_tumor"]

path="/path/to/Snakemake/"
genome="/icgc/dkfzlsdf/analysis/B210/references_genome/Mus_musculus.GRCm38.dna_rm.toplevel.fa"

rule all:
    input:  
        "/path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq",
        "/path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq",
        "/path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq",
        "/path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq",
        "/path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam.bai",
        "/path/to/Snakemake/AS-327907-LR-41624_tumor_no_dupl_sort.bam.bai",
        "/path/to/Snakemake/AS-327905-LR-41624_normal_RG.bam"
        "/path/to/Snakemake/AS-327907-LR-41624_tumor_RG.bam"


rule trim_galore:
    input:
        r1="/path/to/Snakemake/AS-{num}_R1.fastq",
        r2="/path/to/Snakemake/AS-{num}_R2.fastq"
    output:
        "/path/to/Snakemake/AS-{num }_R1_val_1.fq",
        "/path/to/Snakemake/AS-{num }_R2_val_2.fq"
    shell:
        "module load trim-galore/0.5.0 ; module load pypy/2.7-6.0.0 ; trim_galore  --output_dir /path/to/Snakemake/  --paired {input.r1} {input.r2}  "  

rule bwa_mem:
    input:
        R1="/path/to/Snakemake/AS-{num}_R1_val_1.fq",
        R2="/path/to/Snakemake/AS-{num}_R2_val_2.fq"
    output:
        "/path/to/Snakemake/AS-{num}.bam"
    shell:
        "module load samtools/default ; module load bwa/0.7.8 ; bwa mem  {genome}  {input.R1} {input.R2} | samtools view -h -b  > {output} "

rule samtools_sort:
    input:
        "/path/to/Snakemake/AS-{num}.bam"
    output:
        "/path/to/Snakemake/AS-{num}_sort.bam"
    shell:
        "module load samtools/default ; samtools sort -n  -O BAM {input} > {output} "

rule remove_duplicates:
    input:
        "/path/to/Snakemake/AS-{num}_sort.bam"
    output:
        outbam="/path/to/Snakemake/AS-{num}_no_dupl_sort.bam",
        metrics="/path/to/Snakemake/AS-{num}_dupl_metrics.txt"
    shell:
        "module load gatk/4.0.9.0 ; gatk MarkDuplicates -I {input}  -O {output.outbam} -M {output.metrics}  --REMOVE_DUPLICATES=true "

rule samtools_index:
    input:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam.bai"
    shell:
        "module load samtools/default ; samtools index  {input} "

rule add_readgroup_normal:
    input:
        "/path/to/Snakemake/AS-{normal}_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{normal}_RG.bam"
    shell:
        "module load gatk/4.0.9.0 ;  gatk AddOrReplaceReadGroups   -PL Illumina -LB  { wildcards.normal }   -PU  { wildcards.normal }  -SM  NORMAL  -I  { input }    -O  {output} "

rule add_readgroup_tumor:
    input:
        "/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
    output:
        "/path/to/Snakemake/AS-{num_,'.*tumor.*'}_RG.bam"
    shell:
        "module load gatk/4.0.9.0 ;  gatk AddOrReplaceReadGroups   -PL Illumina -LB  { wildcards.num }   -PU  { wildcards.num }  -SM  TUMOR     -I  { input }    -O  {output} "

错误:

代码语言:javascript
复制
Building DAG of jobs...
MissingInputException in line 37 of /home/l136n/Snakefile_mapping_snv_call_pipeline2b1:
Missing input files for rule trim_galore:
/path/to/Luca/Snakemake/AS-327905-LR-41624_normal_RG.bam/path/to/Luca/Snakemake/AS-327907-LR-41624_tumor_RG_R1.fastq
/path/to/Snakemake/AS-327905-LR-41624_normal_RG.bam/path/to/Luca/Snakemake/AS-327907-LR-41624_tumor_RG_R2.fastq
EN

回答 2

Stack Overflow用户

发布于 2019-04-09 01:49:48

shell中,可以使用语法{wilcards.var}而不是{var}来访问通配符。在rule add_readgroup_normal中有后者。Source

票数 1
EN

Stack Overflow用户

发布于 2019-05-08 21:22:13

我想我会提供解决方案,即使这篇文章现在有点老了。该错误只是由于"{ wildcards.var }“中存在空格所致。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55576386

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档