首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >删除R中重叠的ATAC序列峰值

删除R中重叠的ATAC序列峰值
EN

Stack Overflow用户
提问于 2020-12-18 05:57:58
回答 1查看 89关注 0票数 2

我有一份ATAC序列窄峰的档案。与PEPATAC流水线类似,这里的算法是:首先,保留最重要的峰值,并删除与该显著峰值直接重叠的任何峰值。然后,此过程迭代到下一个最重要的峰值,依此类推,直到由于与更重要的峰值直接重叠而保留或移除了所有峰值。

以下是我的数据示例

代码语言:javascript
复制
data <- structure(list(V1 = c("chr3", "chrUn_KI270467v1", "chr8", "chrUn_KI270467v1", 
"chr21", "chr7"), V2 = c(93470281L, 1668L, 109333171L, 1668L, 
14382415L, 12686167L), V3 = c(93470873L, 3946L, 109335050L, 3946L, 
14384230L, 12688127L), V4 = c("Bcell-BM4983-150120-RepA.pe.q10.sort.rmdup_peak_60893", 
"Bcell-BM4983-150120-RepA.pe.q10.sort.rmdup_peak_97388b", "Bcell-BM4983-150120-RepA.pe.q10.sort.rmdup_peak_92241d", 
"Bcell-BM4983-150120-RepA.pe.q10.sort.rmdup_peak_97388c", "Bcell-BM4983-150120-RepA.pe.q10.sort.rmdup_peak_55158c", 
"Bcell-BM4983-150120-RepA.pe.q10.sort.rmdup_peak_83188b"), V5 = c(91371L, 
24480L, 19002L, 17131L, 17084L, 16639L), V6 = c(".", ".", ".", 
".", ".", "."), V7 = c(726.76721, 240.65007, 195.49055, 179.63454, 
179.23312, 175.41965), V8 = c(9144.74609, 2454.88721, 1906.9408, 
1719.66797, 1714.96497, 1670.38489), V9 = c(9137.11816, 2448.07471, 
1900.24915, 1713.15186, 1708.45618, 1663.93958), V10 = c(272L, 
666L, 1082L, 1445L, 898L, 525L)), row.names = c(88715L, 141209L, 
133771L, 141210L, 80584L, 120831L), class = "data.frame")

我根据它们的重要性对峰值进行了排序。到目前为止,我写了这段代码

代码语言:javascript
复制
 for ( i in 1:nrow(B1.1)){
      sub<-as_granges(B1.1[i , ], seqnames=1, start=2 , end=3)
      que<- as_granges(B1.1 , seqnames=V1 , start=V2 , end=V3)
      overlaps<-which(overlapsAny( que, sub))
      overlaps<-overlaps[overlaps!=i ]
      B1.1<-B1.1[ ! seq(from=1 , to=nrow(B1.1), by=1)%in%overlaps, ]
      overlaps<-c()
    }

显然,这段代码效率不是很高,而且需要花费大量时间。

非常感谢您的建议

EN

回答 1

Stack Overflow用户

发布于 2020-12-18 08:39:50

您为间隔定义了组,并对您的GRanges对象进行了重复数据删除,因为我不能用您的数据显示它,下面是一个根据p值排序的示例数据:

代码语言:javascript
复制
library(GenomicRanges)

gr = GRanges(seqnames=1,
IRanges(start=c(1,1,1,1,5,5),end=c(3,3,2,2,8,9)),
p=sort(runif(6,0,0.1)))

使用reduce定义组,首先创建一个包含所有重叠的宽间隔:

代码语言:javascript
复制
rgr = reduce(gr)

rgr

GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]        1       1-3      *
  [2]        1       5-9      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

现在使用findOverlaps将各个范围分配给组

代码语言:javascript
复制
gr$group = subjectHits(findOverlaps(gr,rgr))

对此进行重复数据删除:

代码语言:javascript
复制
gr[!duplicated(gr$group)]

GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand |                  p     group
         <Rle> <IRanges>  <Rle> |          <numeric> <integer>
  [1]        1       1-3      * | 0.0276333122281358         1
  [2]        1       5-8      * | 0.0465503185754642         2
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

另一种方法是使用while循环:

代码语言:javascript
复制
findNonOvlp = function(g){

   idx = seq_along(g)
   keep = c()
   while(length(idx)>0){
      top_g = idx[1]
      rmv = idx[countOverlaps(g[idx],g[top_g])>0]
      idx = setdiff(idx,rmv)
      keep = c(keep,top_g)
   }
   g[keep]
 }   

测试:

代码语言:javascript
复制
findNonOvlp(gr)

GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |                   p
         <Rle> <IRanges>  <Rle> |           <numeric>
  [1]        1       1-3      * | 0.00038885809481144
  [2]        1       5-8      * |   0.058683192031458
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

再举一个例子:

代码语言:javascript
复制
r = GRanges(seqnames=c("chr1", "chr1", "chr1"), IRanges(start=c(100,140,180),end=c(150,200, 200) ) )

findNonOvlp(r)

GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1   100-150      *
  [2]     chr1   180-200      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65348911

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档