首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从snakemake下载fastq

从snakemake下载fastq
EN

Stack Overflow用户
提问于 2020-03-17 18:59:59
回答 2查看 111关注 0票数 0

在执行以下规则之前,我正在拼命尝试让一个规则下载我的fastq文件。我尝试了很多方法,包括这里建议的:http://ivory.idyll.org/blog/tag/snakemake.html

下面是我的snakemake的简化版本:

代码语言:javascript
复制
######### functions

def read_samplesTable(inputTable):
    data = pandas.read_csv(inputTable)
    # Verify column names
    if not {'run', 'organism', 'name', 'experiment_title', 'cell_line', 'rep', 'study_name', 'library_strategy', 'library_layout', 'study_title'}.issubset(data.columns.values):
            raise KeyError("The samples file must contain the following named columns: 'run', 'organism', 'name', 'experiment_title', 'cell_line', 'rep', 'study_name', 'library_strategy', 'library_layout', 'study_title'")
    return data

def retrieveName(description):
    result = []
    for items in description.iteritems():
        result.append(items[1].split(":")[0])
    return result 

######### Variables

input_table = config["samples"]["summaryFile"]
samplesData = read_samplesTable(input_table)

index_single = samplesData['library_layout'] == 'SINGLE - '
samplesData_single = samplesData[index_single]

gsm_single = retrieveName(samplesData_single["experiment_title"])
outputName_single = samplesData_single['name'] + "_" + samplesData_single['run'] + "_" + gsm_single + "_" + samplesData_single['study_name'] + "_" + samplesData_single['cell_line'] + "_" + samplesData_single['rep']
single_samples = outputName_single.tolist()

names_srrID_single = samplesData_single['run']

############ Rule

rule all:
  input:
    expand("data/single/{singleEndName}.fastq.gz", singleEndName = single_samples)

rule download_fastq_single:
  output:
    singleFastq = "data/single/{singleEndName}.fastq.gz"
  params:
    outputdirectory = config["rawdata"]["fastqrootfolder"]
    ssridsingle = lambda wildcards: samplesData_single.loc[wildcards.names_srrID_single, "run"]    
  shell:
    "fastq-dump --accession {params.srridsingle} --defline-seq '@$sn[_$rn]/$ri' --defline-qual \'+\' --gzip --outdir {params.outputdirectory}"

规则在一个单独的文件'download-fastq-snakefile‘中,我得到了错误SyntaxError in line 6 of download-fastq-snakefile

正如我所料,问题来自ssridsingle = lambda wildcards: samplesData_single.loc[wildcards.names_srrID_single, "run"]

如果你能帮我,那就太好了!

谢谢

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-03-17 21:48:26

我猜问题只是缺少了一个逗号:

代码语言:javascript
复制
  params:
    outputdirectory = config["rawdata"]["fastqrootfolder"], <-- add this comma
    ssridsingle = lambda wildcards: samplesData_single.loc[wildcards.names_srrID_single, "run"]    
票数 2
EN

Stack Overflow用户

发布于 2020-03-18 04:49:33

是的,这是第一个问题,谢谢!

对于第二个问题,我必须修改对srr id的访问,如下所示:

代码语言:javascript
复制
rule download_fastq_single:
  output:
    singleFastq = "data/single/{singleEndName}.fastq.gz"
  params:
    outputdirectory = config["rawdata"]["fastqrootfolder"],
    ssridsingle = lambda wildcards: names_srrID_single    
  shell:
    "fastq-dump --accession {params.ssridsingle} --defline-seq '@$sn[_$rn]/$ri' --defline-qual \'+\' --gzip --outdir {params.outputdirectory}"
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60720998

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档