文章/答案/技术大牛

发布

问迭代运行snakemake规则
EN

Stack Overflow用户

提问于 2020-10-19 14:52:22

回答 1查看 78关注 0票数 0

所以我想我终于抓到了snakemake，但是当我尝试运行几个不同的数据文件时，我意识到它并不像我想的那样工作。这是蛇文件：

import pandas as pd

configfile: "config.json"
experiments = pd.read_csv(config["experiments"], sep = '\t')
experiments['Name'] = [filename.split('/')[-1].split('.fa')[0] for filename in experiments['Files']]

rule all:
    input:
        expand("{output}/Preprocess/Trimmomatic/quality_trimmed_{name}{fr}.fq", output = config["output"],
            fr = (['_forward_paired', '_reverse_paired'] if experiments["Files"].str.contains(',').tolist() else ''),
               name = experiments['Name'])

rule preprocess:
    input:
        experiments["Files"].str.split(',')
    output:
        expand("{output}/Preprocess/Trimmomatic/quality_trimmed_{name}{fr}.fq", output = config["output"],
            fr = (['_forward_paired', '_reverse_paired'] if experiments["Files"].str.contains(',').tolist() else ''),
               name = experiments['Name'])
    threads:
        config["threads"]
    run:
        shell("python preprocess.py -i {reads} -t {threads} -o {output} -adaptdir MOSCA/Databases/illumina_adapters -rrnadbs MOSCA/Databases/rRNA_databases -d {data_type}",
            output = config["output"], data_type = experiments["Data type"].tolist(), reads = ",".join(input))

这是配置文件：

{
  "output": "test_snakemake",
  "threads": 14,
  "experiments": "experiments.tsv"
}

这是实验文件

Files   Sample  Data type   Condition
path/to/mg_R1.fastq,path/to/mg_R2.fastq Sample  dna
path/to/a/0.01/mt_0.01a_R1.fastq,path/to/a/0.01/mt_0.01a_R2.fastq   Sample  rna c1
path/to/b/0.01/mt_0.01b_R1.fastq,path/to/b/0.01/mt_0.01b_R2.fastq   Sample  rna c1
path/to/c/0.01/mt_0.01c_R1.fastq,path/to/c/0.01/mt_0.01c_R2.fastq   Sample  rna c1
path/to/a/1/mt_1a_R1.fastq,path/to/a/1/mt_1a_R2.fastq   Sample  rna c2
path/to/b/1/mt_1b_R1.fastq,path/to/b/1/mt_1b_R2.fastq   Sample  rna c2
path/to/c/1/mt_1c_R1.fastq,path/to/c/1/mt_1c_R2.fastq   Sample  rna c2
path/to/a/100/mt_100a_R1.fastq,path/to/a/100/mt_100a_R2.fastq   Sample  rna c3
path/to/b/100/mt_100b_R1.fastq,path/to/b/100/mt_100b_R2.fastq   Sample  rna c3
path/to/c/100/mt_100c_R1.fastq,path/to/c/100/mt_100c_R2.fastq   Sample  rna c3

我想要做的是让预处理规则分别对待每一行。我认为这就是shell解释命令的方式，它将运行命令python preprocess.py -i path/to/mg_R1.fastq,path/to/mg_R2.fastq -t 14 -o test_snakemake -adaptdir MOSCA/Databases/illumina_adapters -rrnadbs MOSCA/Databases/rRNA_databases -d dna，而是尝试连接所有行并同时对所有示例运行python preprocess.py -i path/to/mg_R1.fastq,path/to/mg_R2.fastq,path/to/a/0.01/mt_0.01a_R1.fastq,path/to/a/0.01/mt_0.01a_R2.fastq,path/to/b/0.01/mt_0.01b_R1.fastq,path/to/b/0.01/mt_0.01b_R2.fastq,... -t 14 -o test_snakemake -adaptdir MOSCA/Databases/illumina_adapters -rrnadbs MOSCA/Databases/rRNA_databases -d dna rna rna rna rna rna rna rna rna rna。

我怎样才能让蛇形蛋糕分别考虑每一行？

config

sample

snakemake

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-10-19 17:52:00

这是一个很常见的错误。要记住的是，规则应该适用于单个样本。Snakemake将选择您的路径(使用通配符)，并根据规则生成特定的作业。您已经编写了包含所有输入和所有输出的东西，然后我猜想，preprocess.py需要一个输入/输出。

相反，一次只考虑一个文件。对于输出，"{output}/Preprocess/Trimmomatic/quality_trimmed_{name}{fr}.fq"，如何生成该文件？您必须使用名称作为密钥与实验中的输入文件进行匹配。

def preprocess_input(wildcards):
    # get files with matching names
    df = experiments.loc[experiments['Name'] == wildcards.name, 'Files']
    # get first value (in case multiple) and split on commas
    return df.iloc[0].split(',')

rule preprocess:
    input:
        preprocess_input
    output:
        "{output}/Preprocess/Trimmomatic/quality_trimmed_{name}{fr}.fq"
    threads:
        config["threads"]
    shell:
        'python preprocess.py -i {reads} -t {threads} -o {config[output]} ...'

它使用一个输入函数从输出文件中找到正确的输入文件。这并不完美，但应该会让你朝着正确的方向前进。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64429848

复制

相似问题

问迭代运行snakemake规则
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问迭代运行snakemake规则EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问迭代运行snakemake规则
EN