我正在尝试从一个示例文件创建规则的输入。示例文件包含一个列SampleID,该列应用作示例通配符。我想从数据框中的Path_Normal列和Path_Tumor per SampleID列中提取正常和肿瘤bams的路径。
为此,我尝试这样做:
import pandas as pd
input_table = "sampletable.tsv"
samples = pd.read_table(input_table).set_index("SampleID", drop=False)
rule all:
input:
expand("/directory/sm_mutect2_paired/vcf/{sample}.mt2.vcf", sample=samples.index)
rule Mutect2:
input:
tumor = samples[samples['SampleID']=="{sample}"]['Path_Tumor'],
normal = samples[samples['SampleID']=="{sample}"]['Path_Normal']
output:
"/directory/sm_mutect2_paired/vcf/{sample}.mt2.vcf"
conda:
"envs/gatk_mutect2_paired.yaml"
shell:
"gatk --java-options '-Xmx16G -XX:+UseParallelGC -XX:ParallelGCThreads=16' Mutect2 \
-R /directory/ref/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta \
{input.tumor} \
{input.normal} \
-L /directory/GATK_interval_files_Agilent/S07604514_hs_hg38/S07604514_Covered.bed \
-O {output} \
--af-of-alleles-not-in-resource 2.5e-06 \
--germline-resource /directory/GATK_gnomad/af-only-gnomad.hg38.vcf.gz \
-pon /home/zyto/unger/GATK_PoN/1000g_pon.hg38.vcf.gz"
...在进行试运行时,我没有收到错误消息,但是执行失败了,因为输入是空的,这变成了查看日志:
atk --java-options '-Xmx16G -XX:+UseParallelGC -XX:ParallelGCThreads=16' Mutect2 -R /directory/GATK_ref/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta -L /directory/GATK_interval_files_Agilent/S07604514_hs_hg38/S07604514_Covered.bed -O /directory/WES_Rezidiv_HNSCC_Clonality/sm_mutect2_paired/vcf/HL05_Rez_HL05_NG.mt2.vcf --af-of-alleles-not-in-resource 2.5e-06 --germline-resource /directory/GATK_gnomad/af-only-gnomad.hg38.vcf.gz -pon /directory/GATK_PoN/1000g_pon.hg38.vcf.gz这两个输入文件应该出现在"Mutect2“和"-R”之间。
所以看起来我在定义输入时做了一些错误的事情…
发布于 2021-03-11 03:14:00
当作业和通配符值已知时,您需要将该规则的输入文件的确定推迟到所谓的DAG阶段。这是通过input functions实现的。我强烈推荐使用official Snakemake tutorial,它深入地介绍了这个主题。
https://stackoverflow.com/questions/66568868
复制相似问题