文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在执行GenomicRanges::subsetByOverlaps后显示所有列

问如何在执行GenomicRanges::subsetByOverlaps后显示所有列
EN

Stack Overflow用户

提问于 2017-12-04 13:27:16

回答 1查看 705关注 0票数 1

我有以下两个GenomicRanges对象。第一个gr1是这样的：

library(GenomicRanges)
set.seed(1)
gr1 <- GRanges(
        seqnames=Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
        ranges=IRanges(1:10, width=10:1, names=head(letters,10)),
        strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
        motif_score=seq(1, 0, length=10),
         motif_name=paste0("Motif_", toupper(sample(c(letters,letters))))[1:10]
      )
gr1
#> GRanges object with 10 ranges and 2 metadata columns:
#>     seqnames    ranges strand | motif_score  motif_name
#>        <Rle> <IRanges>  <Rle> |   <numeric> <character>
#>   a     chr1  [ 1, 10]      - |   1.0000000     Motif_N
#>   b     chr2  [ 2, 10]      + |   0.8888889     Motif_S
#>   c     chr2  [ 3, 10]      + |   0.7777778     Motif_C
#>   d     chr2  [ 4, 10]      * |   0.6666667     Motif_S
#>   e     chr1  [ 5, 10]      * |   0.5555556     Motif_J
#>   f     chr1  [ 6, 10]      + |   0.4444444     Motif_Q
#>   g     chr3  [ 7, 10]      + |   0.3333333     Motif_R
#>   h     chr3  [ 8, 10]      + |   0.2222222     Motif_D
#>   i     chr3  [ 9, 10]      - |   0.1111111     Motif_B
#>   j     chr3  [10, 10]      - |   0.0000000     Motif_C
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths

和第二个对象gr2

gr2 <- GRanges(seqnames="chr2", 
               ranges=IRanges(4:3, 6),
               peak_name=c("peak_1", "peak_2"),
               strand="+", peak_score=5:4)
gr2
#> GRanges object with 2 ranges and 2 metadata columns:
#>       seqnames    ranges strand |   peak_name peak_score
#>          <Rle> <IRanges>  <Rle> | <character>  <integer>
#>   [1]     chr2    [4, 6]      + |      peak_1          5
#>   [2]     chr2    [3, 6]      + |      peak_2          4
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

然后我使用subsetByOverlaps在gr1和gr2之间执行区域重叠

 subsetByOverlaps(gr1, gr2)
#> GRanges object with 3 ranges and 2 metadata columns:
#>     seqnames    ranges strand | motif_score  motif_name
#>        <Rle> <IRanges>  <Rle> |   <numeric> <character>
#>   b     chr2   [2, 10]      + |   0.8888889     Motif_S
#>   c     chr2   [3, 10]      + |   0.7777778     Motif_C
#>   d     chr2   [4, 10]      * |   0.6666667     Motif_S
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths

正如您所看到的，peak_name和peak_score列在交集之后不会出现。我怎么才能把它们都展示出来呢？

bioinformatics

bioconductor

回答 1

Stack Overflow用户

发布于 2017-12-04 15:34:03

首先，我们找到gr1 (查询对象)中的特征与gr2 (主题对象)中的特征的所有重叠。

# Find overlaps
m <- findOverlaps(gr1, gr2);

然后，我们将匹配的要素存储在gr1.matched中，并从gr2添加元数据。

# Features from gr1 with overlaps in gr2
# Note: The same feature from gr1 can overlap with mulitple features from gr2
gr1.matched <- gr1[queryHits(m)];

# Add the metadata from gr2
mcols(gr1.matched) <- cbind.data.frame(
    mcols(gr1.matched),
    mcols(gr2[subjectHits(m)]));

gr1.matched;
#GRanges object with 6 ranges and 4 metadata columns:
#    seqnames    ranges strand | motif_score  motif_name   peak_name peak_score
#       <Rle> <IRanges>  <Rle> |   <numeric> <character> <character>  <integer>
#  b     chr2   [2, 10]      + |   0.8888889     Motif_S      peak_2          4
#  b     chr2   [2, 10]      + |   0.8888889     Motif_S      peak_1          5
#  c     chr2   [3, 10]      + |   0.7777778     Motif_C      peak_2          4
#  c     chr2   [3, 10]      + |   0.7777778     Motif_C      peak_1          5
#  d     chr2   [4, 10]      * |   0.6666667     Motif_S      peak_2          4
#  d     chr2   [4, 10]      * |   0.6666667     Motif_S      peak_1          5
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47627003

复制

相似问题

问如何在执行GenomicRanges::subsetByOverlaps后显示所有列
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在执行GenomicRanges::subsetByOverlaps后显示所有列EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在执行GenomicRanges::subsetByOverlaps后显示所有列
EN