文章/答案/技术大牛

发布

社区首页 >问答首页 >snakemake:如何对新创建的文件使用glob_wildcards？

问snakemake:如何对新创建的文件使用glob_wildcards？
EN

Stack Overflow用户

提问于 2020-07-13 13:33:33

回答 1查看 3.6K关注 0票数 2

问题是：

我有一个很大的工作流，在某个时候每个{sample}创建任意数量的文件，例如test1.txt、test2.txt等。

然后，我需要使用这些文件进行进一步的处理。下一个规则的输入文件是{sample}/test1.txt、{sample}/test2.txt等，因此test1、test2等成为通配符。

数据结构是：

---data
 ---sample1
   ---test1.txt
   ---test2.txt
   ---test3.txt
 ---sample2
   ---test1.txt
   ---test2.txt
Snakefile

我在为如何使用snakemake来解决这些问题而挣扎。我已经研究过函数glob_wildcards，但不知道如何使用它。

直觉上，我会做这样的事情：

samples = ['sample1', 'sample2']

rule append_hello:
  input:
    glob_wildcards('data/{sample}/{id}.txt')
  output:
    'data/{sample}/{id}_2.txt'
  shell:
    " echo {input} 'hello' >> {output} "

我有两个问题：

如何在Snakemkae处理这个问题？
您将如何构造一个rule all来运行它。

如对进一步阅读有任何意见或提示，将不胜感激。

编辑

我认为这与通配符约束有关。当我跑步时：

assemblies = []
for filename in glob_wildcards(os.path.join("data/{sample}", "{i}.txt")):
    assemblies.append(filename)
print(assemblies)

我得到两个对应索引匹配的列表：

[['sample1', 'sample1', 'sample1', 'sample2', 'sample2'], ['test1', 'test2', 'test3', 'test5', 'test4']]

现在，我基本上只需要告诉snakemake使用相应的通配符值。

python

snakemake

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-07-14 05:30:49

您的问题是，在执行任何规则之前，只对glob_wildcards进行一次计算，因此系统不知道该规则会生成多少文件。

你需要的是一个检查点。此功能允许您在某个点停止Snakemake并重新评估DAG。

samples = ["sample1", "sample2"]

rule all:
    input:
        expand("data/{sample}/processed.txt", sample=samples)


checkpoint generate_arbitrary:
    output:
        directory("data/{sample}/arbitrary")
    run:
        if wildcards.sample == "sample1":
            n = 3
        else:
            n = 2

        shell("mkdir {output}")
        for id in range(1, n + 1):
            shell(f"echo '{{id}}' > data/{wildcards.sample}/arbitrary/{id}.txt")



def aggregate_input(wildcards):
    checkpoints.generate_arbitrary.get(sample=wildcards.sample)
    ids = glob_wildcards(f"data/{wildcards.sample}/arbitrary/{{id}}.txt").id
    return expand(f"data/{wildcards.sample}/arbitrary/{{id}}.txt", id=ids)


rule append_hello:
    input:
        aggregate_input
    output:
        "data/{sample}/processed.txt"
    shell:
        "echo {input} 'hello' > {output}"

票数 5

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62876986

复制

相似问题

问snakemake:如何对新创建的文件使用glob_wildcards？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问snakemake:如何对新创建的文件使用glob_wildcards？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问snakemake:如何对新创建的文件使用glob_wildcards？
EN