文章/答案/技术大牛

发布

社区首页 >问答首页 >请解释一下为什么我在蛇形蛋糕上会有这个错误？我已经胡思乱想好几天了，请告诉我出了什么问题。

问请解释一下为什么我在蛇形蛋糕上会有这个错误？我已经胡思乱想好几天了，请告诉我出了什么问题。
EN

Stack Overflow用户

提问于 2022-07-03 15:19:00

回答 1查看 100关注 0票数 0

我用snakemake编写了这个管道，以处理我的fastq文件并获取原始计数，但由于某些原因，我在最后一条规则(特性计数)中不理解这个错误：

WildcardError在/mnt/c/Users/manso/Desktop/hel/pe.py的第175行:无法从输出文件中确定输入文件中的通配符：'sample‘

其他规则使用与featureCounts规则相同的输入，因此我不明白为什么它会返回该特定规则的错误。

我真的很感谢你的帮助。

这是我的蛇形：

(SAMPLE,FRR) = glob_wildcards("rawReads/{sample}_{frr}.fastq.gz")

rule all:
    input:
        #raw_FASTQC
        expand("rawQC/fastqc/{sample}_{frr}_fastqc.html", sample=SAMPLE, frr=FRR),
        expand("rawQC/fastqc/{sample}_{frr}_fastqc.zip", sample=SAMPLE, frr=FRR),
        #raw_MultiQC
        "rawQC/multiqc_report.html",
        #FASTP
        expand("trimmedReads/{sample}_1.fastq.gz", sample=SAMPLE),
        expand("trimmedReads/{sample}_2.fastq.gz", sample=SAMPLE),
        expand("trimmedReads/{sample}_fastp_report.html", sample=SAMPLE),        
        #trimmed_FASTQC
        expand("trimmedQC/fastqc/{sample}_{frr}_fastqc.html", sample=SAMPLE, frr=FRR),
        expand("trimmedQC/fastqc/{sample}_{frr}_fastqc.zip", sample=SAMPLE, frr=FRR),
        #trimmed_MultiQC
        "trimmedQC/multiqc_report.html",
        #get fa and gtf files
        "genome/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa",
        "genome/Homo_sapiens.GRCh38.106.gtf.gz",
        #HISAT2_index
        ["index."  + str(i) + ".ht2" for i in range(1,9)],
        #HISAT_align
        expand("aligned/{sample}.bam", sample=SAMPLE),
        #samtools
        expand("aligned/{sample}.sorted.bam", sample=SAMPLE),
        expand("samtools_stats/{sample}.stats.txt", sample=SAMPLE),
        expand("samtools_stats/{sample}.flagstat.txt", sample=SAMPLE),
        #rawCounts
        "raw_Counts"


rule raw_FASTQC:
    input:
        "rawReads/{sample}_{frr}.fastq.gz",
    output:
        html="rawQC/fastqc/{sample}_{frr}_fastqc.html",
        zip= "rawQC/fastqc/{sample}_{frr}_fastqc.zip", # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
    params: "--quiet"
    log:
        "logs/fastqc/{sample}_{frr}.log"
    threads: 16
    wrapper:
        "v1.7.0/bio/fastqc"


rule raw_MultiQC:
    input:
        expand("rawQC/fastqc/{sample}_{frr}_fastqc.zip", sample=SAMPLE, frr=FRR),
    params:
        path="rawQC/fastqc"
    output:
       "rawQC/multiqc_report.html"
    shell:
        "multiqc --force -n {output} {params.path}" 


rule FASTP:
    input:
         read1="rawReads/{sample}_1.fastq.gz",
         read2="rawReads/{sample}_2.fastq.gz",
    output:
        trimmed1="trimmedReads/{sample}_1.fastq.gz",
        trimmed2="trimmedReads/{sample}_2.fastq.gz",
        report_html= "trimmedReads/{sample}_fastp_report.html",
    threads: 16
    shell:
         " fastp --thread {threads} -i {input.read1} -I {input.read2} -o {output.trimmed1} -O {output.trimmed2} -h {output.report_html} "


rule trimmed_FASTQC:
    input:
        "trimmedReads/{sample}_{frr}.fastq.gz"
    output:
        html="trimmedQC/fastqc/{sample}_{frr}_fastqc.html", 
        zip="trimmedQC/fastqc/{sample}_{frr}_fastqc.zip",  # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
    params: "--quiet"
    log:
        "logs/fastqc/{sample}_{frr}.log"
    threads: 16
    wrapper:
        "v1.7.0/bio/fastqc"
        

rule trimmed_MultiQC:
    input:
        expand("trimmedQC/fastqc/{sample}_{frr}_fastqc.zip", sample=SAMPLE, frr=FRR),
    params:
        path="trimmedQC/fastqc"
    output:
       "trimmedQC/multiqc_report.html"
    shell:
        "multiqc --force -n {output} {params.path} "


#Get annotation GTF
rule get_genome_gtf:
    "Downloading Genome annotation file from Ensemble, Homo sapiens primary assembly (GRCh38)"
    output:
        gtf = "genome/Homo_sapiens.GRCh38.106.gtf.gz"
    shell:
        "cd genome"
        " && wget ftp://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.106.gtf.gz"
        " && gunzip -k Homo_sapiens.GRCh38.106.gtf.gz "


# Get genome fa
rule get_genome_fa:
    "Downloading Genome sequence, Homo sapiens primary assembly (GRCh38)"
    output:
        fa = "genome/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa"
    shell:
        "cd genome"
        " && wget ftp://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz"
        " && gunzip -k Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa "


rule HISAT2_index:
    input:
        fa = rules.get_genome_fa.output.fa
    output:
        ["index." + str(i) + ".ht2" for i in range(1,9)],
    message:
        "indexing genome"
    threads: 16
    shell:
        " hisat2-build -p {threads} {input.fa} index --quiet"


rule HISAT2_align:
    input:
        read1=rules.FASTP.output.trimmed1,
        read2=rules.FASTP.output.trimmed2,
        index=rules.HISAT2_index.output
    output:
        bam="aligned/{sample}.bam",
        metrics="logs/{sample}_HISATmetrics.txt"
    threads: 16
    shell:
        " hisat2 --threads {threads} -x index -1 {input.read1} -2 {input.read2} 2> {output.metrics}"
        " | samtools view -Sbh -o {output.bam} "


rule samtools_sort:
    input:
        aligned=rules.HISAT2_align.output.bam
        #"aligned/{sample}.bam"
    output:
        "aligned/{sample}.sorted.bam"
    threads: 8 
    shell:
       "samtools sort {input.aligned} -o {output}"


rule samtools_stats:
    input:
        "aligned/{sample}.sorted.bam",
    output:
        "samtools_stats/{sample}.stats.txt",
    shell:
       "samtools stats {input} > {output} "

rule samtools_flagstat:
    input:
        "aligned/{sample}.sorted.bam",
    output:
        "samtools_stats/{sample}.flagstat.txt",
    shell:
        "samtools flagstat {input} > {output} "
        

rule featureCounts:
    input:
        samples="aligned/{sample}.sorted.bam",   
        gtf=rules.get_genome_gtf.output.gtf
    output:
        "raw_Counts"
    threads:
        16
    shell:
        "featureCounts -T {threads} -a {input.gtf} -o {output} {input.samples}"
´´´

pipeline

wildcard

snakemake

rna-seq

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-07-03 15:33:24

Snakemake在输出中使用模式来推断要使用的输入。在最后一条规则中，输出是raw_Counts，它没有给出用于{sample}通配符的指示。将其更改为类似的内容可能对您的用例有效：

rule featureCounts:
    input:
        samples="aligned/{sample}.sorted.bam",   
        gtf=rules.get_genome_gtf.output.gtf
    output:
        "raw_Counts_{sample}.txt"

这需要将扩展版本添加到规则all中。

    # add this target to rule all
    expand("raw_Counts_{sample}.txt", sample=SAMPLE),

编辑:如果将此规则用作聚合，则在输入指令中，您将希望通过替换所有值来删除通配符搜索。

rule featureCounts:
    input:
        samples=expand("aligned/{sample}.sorted.bam", sample=SAMPLE),   
        gtf=rules.get_genome_gtf.output.gtf
    output:
        "raw_Counts"

编辑2:请注意，glob_wildcards并不返回每个通配符的唯一值，而是返回与每个全局文件关联的通配符。如果您想要唯一的值，那么实现这一目标的一个简单方法就是将SAMPLE转换为一个集合(特别是针对这个规则)。

rule featureCounts:
    input:
        samples=expand("aligned/{sample}.sorted.bam", sample=set(SAMPLE)),   
        gtf=rules.get_genome_gtf.output.gtf
    output:
        "raw_Counts"

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72847668

复制

相似问题

问请解释一下为什么我在蛇形蛋糕上会有这个错误？我已经胡思乱想好几天了，请告诉我出了什么问题。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问请解释一下为什么我在蛇形蛋糕上会有这个错误？我已经胡思乱想好几天了，请告诉我出了什么问题。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问请解释一下为什么我在蛇形蛋糕上会有这个错误？我已经胡思乱想好几天了，请告诉我出了什么问题。
EN