首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在执行GenomicRanges::subsetByOverlaps后显示所有列

如何在执行GenomicRanges::subsetByOverlaps后显示所有列
EN

Stack Overflow用户
提问于 2017-12-04 13:27:16
回答 1查看 705关注 0票数 1

我有以下两个GenomicRanges对象。第一个gr1是这样的:

代码语言:javascript
复制
library(GenomicRanges)
set.seed(1)
gr1 <- GRanges(
        seqnames=Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
        ranges=IRanges(1:10, width=10:1, names=head(letters,10)),
        strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
        motif_score=seq(1, 0, length=10),
         motif_name=paste0("Motif_", toupper(sample(c(letters,letters))))[1:10]
      )
gr1
#> GRanges object with 10 ranges and 2 metadata columns:
#>     seqnames    ranges strand | motif_score  motif_name
#>        <Rle> <IRanges>  <Rle> |   <numeric> <character>
#>   a     chr1  [ 1, 10]      - |   1.0000000     Motif_N
#>   b     chr2  [ 2, 10]      + |   0.8888889     Motif_S
#>   c     chr2  [ 3, 10]      + |   0.7777778     Motif_C
#>   d     chr2  [ 4, 10]      * |   0.6666667     Motif_S
#>   e     chr1  [ 5, 10]      * |   0.5555556     Motif_J
#>   f     chr1  [ 6, 10]      + |   0.4444444     Motif_Q
#>   g     chr3  [ 7, 10]      + |   0.3333333     Motif_R
#>   h     chr3  [ 8, 10]      + |   0.2222222     Motif_D
#>   i     chr3  [ 9, 10]      - |   0.1111111     Motif_B
#>   j     chr3  [10, 10]      - |   0.0000000     Motif_C
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths

和第二个对象gr2

代码语言:javascript
复制
gr2 <- GRanges(seqnames="chr2", 
               ranges=IRanges(4:3, 6),
               peak_name=c("peak_1", "peak_2"),
               strand="+", peak_score=5:4)
gr2
#> GRanges object with 2 ranges and 2 metadata columns:
#>       seqnames    ranges strand |   peak_name peak_score
#>          <Rle> <IRanges>  <Rle> | <character>  <integer>
#>   [1]     chr2    [4, 6]      + |      peak_1          5
#>   [2]     chr2    [3, 6]      + |      peak_2          4
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

然后我使用subsetByOverlapsgr1gr2之间执行区域重叠

代码语言:javascript
复制
 subsetByOverlaps(gr1, gr2)
#> GRanges object with 3 ranges and 2 metadata columns:
#>     seqnames    ranges strand | motif_score  motif_name
#>        <Rle> <IRanges>  <Rle> |   <numeric> <character>
#>   b     chr2   [2, 10]      + |   0.8888889     Motif_S
#>   c     chr2   [3, 10]      + |   0.7777778     Motif_C
#>   d     chr2   [4, 10]      * |   0.6666667     Motif_S
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths

正如您所看到的,peak_namepeak_score列在交集之后不会出现。我怎么才能把它们都展示出来呢?

EN

回答 1

Stack Overflow用户

发布于 2017-12-04 15:34:03

首先,我们找到gr1 (查询对象)中的特征与gr2 (主题对象)中的特征的所有重叠。

代码语言:javascript
复制
# Find overlaps
m <- findOverlaps(gr1, gr2);

然后,我们将匹配的要素存储在gr1.matched中,并从gr2添加元数据。

代码语言:javascript
复制
# Features from gr1 with overlaps in gr2
# Note: The same feature from gr1 can overlap with mulitple features from gr2
gr1.matched <- gr1[queryHits(m)];

# Add the metadata from gr2
mcols(gr1.matched) <- cbind.data.frame(
    mcols(gr1.matched),
    mcols(gr2[subjectHits(m)]));

gr1.matched;
#GRanges object with 6 ranges and 4 metadata columns:
#    seqnames    ranges strand | motif_score  motif_name   peak_name peak_score
#       <Rle> <IRanges>  <Rle> |   <numeric> <character> <character>  <integer>
#  b     chr2   [2, 10]      + |   0.8888889     Motif_S      peak_2          4
#  b     chr2   [2, 10]      + |   0.8888889     Motif_S      peak_1          5
#  c     chr2   [3, 10]      + |   0.7777778     Motif_C      peak_2          4
#  c     chr2   [3, 10]      + |   0.7777778     Motif_C      peak_1          5
#  d     chr2   [4, 10]      * |   0.6666667     Motif_S      peak_2          4
#  d     chr2   [4, 10]      * |   0.6666667     Motif_S      peak_1          5
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/47627003

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档