首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >snakemake中的关联变量

snakemake中的关联变量
EN

Stack Overflow用户
提问于 2022-02-27 18:57:49
回答 2查看 44关注 0票数 0

假设我有示例SAMPLE_A,分为两个文件SAMPLE_A_1, SAMPLE_A_2和与条形码AATT, TTAA相关联的SAMPLE_B,以及与条形码CCGG, GGCC, GCGC相关联的SAMPLE_B,分为4个文件SAMPLE_B_1...SAMPLE_B_4

我可以创建getSampleNames()来获取[SAMPLE_A,SAMPLE_A,SAMPLE_B,SAMPLE_B,SAMPLE_B,SAMPLE_B][1,2,1,2,3,4],然后压缩它们以获得组合{sample}_{id}。然后我可以对条形码做同样的事情:[SAMPLE_A,SAMPLE_A,SAMPLE_B,SAMPLE_B,SAMPLE_B][AATT, TTAA,CCGG, GGCC, GCGC]

代码语言:javascript
复制
SAMPLES_ID,IDs = getSampleNames()
SAMPLES_BC,BCs = getBCs(set(SAMPLES_ID))

rule refine:
input:
    '{sample}/demultiplex/{sample}_{id}.demultiplex.bam'
output:
    bam = '{sample}/polyA_trimming/{sample}_{id}.fltnc.bam',
shell:
    "isoseq3 refine {input} "


rule split:
input:
    expand('{sample}/polyA_trimming/{sample}_{id}.fltnc.bam', zip, sample = SAMPLES_ID, id = IDs),
output:
    expand("{sample}/cells/{barcode}_{sample}/fltnc.bam", zip, sample = SAMPLES_BC, barcode = BCs),
shell:
    "python {params.script_dir}/split_cells_bam.py"


rule dedup_split:
input:
    "{sample}/cells/{barcode}_{sample}/fltnc.bam"
output:
    bam = "{sample}/cells/{barcode}_{sample}/dedup/dedup.bam",
shell:
    "isoseq3 dedup {input} {output.bam} "

rule merge:
input:
    expand("{sample}/cells/{barcode}_{sample}/dedup/dedup.bam",
        zip, sample = SAMPLES_BC, barcode = BCs),

如何防止规则拆分成为我的管道中的瓶颈?现在,它等待对所有样本执行细化规则,而不是必要的,每个示例应该独立运行,但是我不能,因为每个示例的条形码集是不同的。有没有办法让你

expand("{sample}/cells/{barcode}_{sample}/fltnc.bam", zip, sample = SAMPLES_BC, barcode = BCs[SAMPLES_BC]){sample} of SAMPLES_BCBCs字典中的一个键吗?IDs也一样吗?我知道我可以使用函数,但是我不知道如何通过规则传播{barcode}

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-03-01 01:33:38

我找到了如何通过函数使用字典,这解决了我的问题!

此解决方案的主要默认设置是必须创建一个虚拟文件作为拆分规则的输出,而不是检查每个“{sample}/cell/{条形码}{sample}/fltnc.bam”文件是否已创建,因此我仍在寻找更优雅的.

代码语言:javascript
复制
IDs = getSampleNames() #{SAMPLE_A:[1,2], SAMPLE_B:[1,2,3,4]}
SAMPLES = list(IDs.keys()) 
BCs = getBCs(SAMPLES) #{SAMPLE_A:[AATT, TTAA], SAMPLE_B:[CCGG,GGCC,GCGC]}
    
# function linking IDs and SAMPLE
def sample2ids(wildcards):
    return expand('{{sample}}/polyA_trimming/{{sample}}_{id}.fltnc.bam', 
               id = IDs[wildcards.sample])

# function linking BCs and SAMPLE
def sample2ids(wildcards):
    return expand('{{sample}}/cells/{barcode}_{{sample}}/dedup/dedup.bam',
               barcode = BCs[wildcards.sample])

rule refine:
input:
    '{sample}/demultiplex/{sample}_{id}.demultiplex.bam'
output:
    bam = '{sample}/polyA_trimming/{sample}_{id}.fltnc.bam',

rule split:
input:
    sample2ids
output:
    # cannot use a function here, so I create a dummy file to pipe
    'dummy_file.txt'

rule dedup_split:
input:
    'dummy_file.txt'
output:
    bam = "{sample}/cells/{barcode}_{sample}/dedup/dedup.bam",


rule merge:
input:
    sample2bc
票数 0
EN

Stack Overflow用户

发布于 2022-02-28 13:54:09

根据您的评论,有几条路线可供选择,包括更改包含示例、条形码和ids的数据结构。现在,您只需在每个示例中创建一个规则:

代码语言:javascript
复制
for sample in set(SAMPLES_ID):  # get uniq samples
    # get ids and barcodes for this sample
    ids = [tup[1] for tup in zip(SAMPLES_ID, IDs) if tup[0] == sample]
    bcs = [tup[1] for tup in zip(SAMPLES_BC, BCs) if tup[0] == sample]

    rule:
        name: f'{sample}_split'
        input:
            expand('{sample}/polyA_trimming/{sample}_{id}.fltnc.bam', 
                   sample = sample, id = ids),
        output:
            expand("{sample}/cells/{barcode}_{sample}/fltnc.bam", 
                   sample = sample, barcode = bcs),
        shell:
            "python {params.script_dir}/split_cells_bam.py"

您不需要在展开中压缩,因为ids和bcs是针对单个示例的。总的来说,我不认为这是最好的方法,但是对于您当前的工作流来说,这将是最简单的方法。

只要注意到shell命令,如何将输入/输出传递给脚本?

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71287637

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档