我正在尝试下载FASTQ文件从FTP服务器使用snakemake,我将后处理。文件名位于"read1“和"read2”列下的data.tsv中。当我尝试以下代码时,会得到以下错误:
ValueError in line 17 ...
This IOFile is specified as a function and may not be used directly.第17行引用shell。我试着搜索了一下,lambda函数看起来是正确的--而且lambda函数也是已接受 in params。
这是我的密码:
import pandas as pd
samples = pd.read_table("data.tsv").set_index("sample", drop=False)
rule all:
input:
lambda wildcards: samples.to_dict()["read1"][wildcards.sample].split('/')[-1],
lambda wildcards: samples.to_dict()["read2"][wildcards.sample].split('/')[-1]
rule dl:
output:
temp(lambda wildcards: samples.to_dict()["read1"][wildcards.sample].split('/')[-1]),
temp(lambda wildcards: samples.to_dict()["read2"][wildcards.sample].split('/')[-1])
params:
read1 = lambda wildcards: samples.to_dict()["read1"][wildcards.sample],
read2 = lambda wildcards: samples.to_dict()["read2"][wildcards.sample]
shell:
"wget {params.read1}; wget {params.read2}"拜托-我搞不清是怎么回事。
编辑1
如果有用的话,下面使用远程文件的代码可以工作(下面的euronion也建议这样做):
import pandas as pd
from snakemake.remote.FTP import RemoteProvider as FTPRemoteProvider
FTP = FTPRemoteProvider()
samples = pd.read_table("data.tsv").set_index("sample", drop=False)
rule all:
input:
expand("results/{sample}.sam", sample = samples["sample"])
rule bwa:
input:
v = "data/ref.fna",
read1 = lambda wildcards: FTP.remote(samples.loc[wildcards.sample, 'read1']),
read2 = lambda wildcards: FTP.remote(samples.loc[wildcards.sample, 'read2'])
output:
"results/{sample}.sam"
shell:
"scripts/bwa-mem2-2.2.1_x64-linux/bwa-mem2 mem {input.v} {input.read1} {input.read2} > {output}"编辑2我最初尝试的问题是snakemake不允许lambda函数输出。因此,下面的最小工作示例:
read1={'s1': 'test1/ERR7671976_1.fastq.gz'}
read2={'s1': 'test1/ERR7671976_2.fastq.gz'}
rule all:
input:
lambda wildcards: read1[wildcards.sample],
lambda wildcards: read2[wildcards.sample]
rule test:
output:
lambda wildcards: read1[wildcards.sample],
lambda wildcards: read2[wildcards.sample]
params:
r1 = lambda wildcards: read1[wildcards.sample],
r2 = lambda wildcards: read2[wildcards.sample]
shell:
"""
touch {params.r1}
touch {params.r2}
"""获取"SyntaxError:只能将输入文件指定为函数“,而以下内容(用户定义的输出文件名):
read1={'s1': 'test1/ERR7671976_1.fastq.gz'}
read2={'s1': 'test1/ERR7671976_2.fastq.gz'}
rule all:
input:
expand("{sample}_1.fastq.gz", sample=read1.keys()),
expand("{sample}_2.fastq.gz", sample=read2.keys())
rule test:
output:
'{sample}_1.fastq.gz',
'{sample}_2.fastq.gz'
params:
r1 = lambda wildcards: read1[wildcards.sample],
r2 = lambda wildcards: read2[wildcards.sample]
shell:
"""
touch {params.r1}; mv {params.r1} {wildcards.sample}_1.fastq.gz
touch {params.r2}; mv {params.r2} {wildcards.sample}_2.fastq.gz
"""效果很好。
发布于 2022-11-17 05:58:28
如果有用的话,下面使用远程文件的代码可以工作(也是euronion建议的):
import pandas as pd
from snakemake.remote.FTP import RemoteProvider as FTPRemoteProvider
FTP = FTPRemoteProvider()
samples = pd.read_table("data.tsv").set_index("sample", drop=False)
rule all:
input:
expand("results/{sample}.sam", sample = samples["sample"])
rule bwa:
input:
v = "data/ref.fna",
read1 = lambda wildcards: FTP.remote(samples.loc[wildcards.sample, 'read1']),
read2 = lambda wildcards: FTP.remote(samples.loc[wildcards.sample, 'read2'])
output:
"results/{sample}.sam"
shell:
"scripts/bwa-mem2-2.2.1_x64-linux/bwa-mem2 mem {input.v} {input.read1} {input.read2} > {output}"我最初尝试的问题是snakemake不允许lambda函数输出 (参见上面的编辑2)。
https://stackoverflow.com/questions/74447192
复制相似问题