我有一个fasta可能被使用的列表,但是其中一些也可能不被使用,所以如果需要的话,我希望使用snakemake索引fastq。
我把一个yaml文件写成这样
# config.yaml
reference_genome:
fa1: "path/to/genome"
fa2: "..."
fa3: "..."
...我写了这样的蛇形画
configfile: "config.yaml"
rule all:
input:
expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index:
input:
#reference_genomeFile
ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
output:
expand('{reference_genome}.{type}', reference_genome={reference_genome}, type=['amb', 'ann', 'pac'])
log:
'log/rule_index_{reference_genome}.log'
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"我希望snakemake能够监视索引文件(amb, ann, pac),但是这个脚本会引发以下错误:
name 'reference_genome' is not defined
File "/public/...", line ..., in <module>更新:基于@dariober的回答:如果我们运行以下config.yaml
reference_genome:
fa1: "genome_1.fa"
fa2: "genome_2.fa"
fa3: "genome_3.fa"我希望输出是
genome_1.fa.{amb, ann, pac}
genome_2.fa.{amb, ann, pac}
genome_3.fa.{amb, ann, pac}如果我们使用以下解决方法
rule all:
input:
expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index:
input:
#reference_genomeFile
ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
output:
expand('{{reference_genome}}.{type}', type=['amb', 'ann', 'pac'])
log:
'log/rule_index_{reference_genome}.log'
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"我们会得到
$ snakemake -s snakemake_test.smk --configfile config.yaml# for reference_name is fa1
[Fri Dec 2 17:56:29 2022]
rule index:
input: genome_1.fa
output: fa1.amb, fa1.ann, fa1.pac
log: log/rule_index_fa1.log
jobid: 1
wildcards: reference_genome=fa1
...那不是我的预期产出
输出是fa1.amb,fa1.ann,fa1.pac,但是我想要输出是genome_1.fa.amb,genome_1.fa.ann,genome_1.fa.pac
发布于 2022-12-02 08:50:47
我猜你想:
rule all:
input:
expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index:
input:
#reference_genomeFile
ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
output:
expand('{{reference_genome}}.{type}', type=['amb', 'ann', 'pac'])
log:
'log/rule_index_{reference_genome}.log'
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"snakemake -p -n -j 1 --configfile config.yaml也就是说:为每个基因组文件运行规则index,即在这里运行三次。每次运行index都会生成三个索引文件。注意使用双大括号{{reference_genome}}告诉expand这个通配符不需要展开。
示例config.yaml:
reference_genome:
fa1: "genome_1.fa"
fa2: "genome_2.fa"
fa3: "genome_3.fa"发布于 2022-12-02 13:26:11
在dariober's answer的基础上,从你的评论中判断,我认为这就是你想要的?
configfile: "config.yaml"
rule all:
input:
expand(
"{reference_genome}.{type}",
reference_genome=list(config["reference_genome"].values()),
type=["amb", "ann", "pac"],
),
rule index:
input:
#reference_genomeFile
ref_genome="{reference_genome}",
output:
expand("{{reference_genome}}.{type}", type=["amb", "ann", "pac"]),
log:
"log/rule_index_{reference_genome}.log",
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"用config.yaml
reference_genome:
fa1: "path/to/genome1"
fa2: "path/to/genome2"
fa3: "path/to/genome3"我修改了rule all以使用来自config.yaml的文件,而不是list ['fa1','fa2','fa3']。我还将lambda wildcard从rule index的输入中删除,因为这似乎是不必要的。
https://stackoverflow.com/questions/74652332
复制相似问题