我有以下两个GenomicRanges对象。第一个gr1是这样的:
library(GenomicRanges)
set.seed(1)
gr1 <- GRanges(
seqnames=Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges=IRanges(1:10, width=10:1, names=head(letters,10)),
strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
motif_score=seq(1, 0, length=10),
motif_name=paste0("Motif_", toupper(sample(c(letters,letters))))[1:10]
)
gr1
#> GRanges object with 10 ranges and 2 metadata columns:
#> seqnames ranges strand | motif_score motif_name
#> <Rle> <IRanges> <Rle> | <numeric> <character>
#> a chr1 [ 1, 10] - | 1.0000000 Motif_N
#> b chr2 [ 2, 10] + | 0.8888889 Motif_S
#> c chr2 [ 3, 10] + | 0.7777778 Motif_C
#> d chr2 [ 4, 10] * | 0.6666667 Motif_S
#> e chr1 [ 5, 10] * | 0.5555556 Motif_J
#> f chr1 [ 6, 10] + | 0.4444444 Motif_Q
#> g chr3 [ 7, 10] + | 0.3333333 Motif_R
#> h chr3 [ 8, 10] + | 0.2222222 Motif_D
#> i chr3 [ 9, 10] - | 0.1111111 Motif_B
#> j chr3 [10, 10] - | 0.0000000 Motif_C
#> -------
#> seqinfo: 3 sequences from an unspecified genome; no seqlengths和第二个对象gr2
gr2 <- GRanges(seqnames="chr2",
ranges=IRanges(4:3, 6),
peak_name=c("peak_1", "peak_2"),
strand="+", peak_score=5:4)
gr2
#> GRanges object with 2 ranges and 2 metadata columns:
#> seqnames ranges strand | peak_name peak_score
#> <Rle> <IRanges> <Rle> | <character> <integer>
#> [1] chr2 [4, 6] + | peak_1 5
#> [2] chr2 [3, 6] + | peak_2 4
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths然后我使用subsetByOverlaps在gr1和gr2之间执行区域重叠
subsetByOverlaps(gr1, gr2)
#> GRanges object with 3 ranges and 2 metadata columns:
#> seqnames ranges strand | motif_score motif_name
#> <Rle> <IRanges> <Rle> | <numeric> <character>
#> b chr2 [2, 10] + | 0.8888889 Motif_S
#> c chr2 [3, 10] + | 0.7777778 Motif_C
#> d chr2 [4, 10] * | 0.6666667 Motif_S
#> -------
#> seqinfo: 3 sequences from an unspecified genome; no seqlengths正如您所看到的,peak_name和peak_score列在交集之后不会出现。我怎么才能把它们都展示出来呢?
发布于 2017-12-04 15:34:03
首先,我们找到gr1 (查询对象)中的特征与gr2 (主题对象)中的特征的所有重叠。
# Find overlaps
m <- findOverlaps(gr1, gr2);然后,我们将匹配的要素存储在gr1.matched中,并从gr2添加元数据。
# Features from gr1 with overlaps in gr2
# Note: The same feature from gr1 can overlap with mulitple features from gr2
gr1.matched <- gr1[queryHits(m)];
# Add the metadata from gr2
mcols(gr1.matched) <- cbind.data.frame(
mcols(gr1.matched),
mcols(gr2[subjectHits(m)]));
gr1.matched;
#GRanges object with 6 ranges and 4 metadata columns:
# seqnames ranges strand | motif_score motif_name peak_name peak_score
# <Rle> <IRanges> <Rle> | <numeric> <character> <character> <integer>
# b chr2 [2, 10] + | 0.8888889 Motif_S peak_2 4
# b chr2 [2, 10] + | 0.8888889 Motif_S peak_1 5
# c chr2 [3, 10] + | 0.7777778 Motif_C peak_2 4
# c chr2 [3, 10] + | 0.7777778 Motif_C peak_1 5
# d chr2 [4, 10] * | 0.6666667 Motif_S peak_2 4
# d chr2 [4, 10] * | 0.6666667 Motif_S peak_1 5
# -------
# seqinfo: 3 sequences from an unspecified genome; no seqlengthshttps://stackoverflow.com/questions/47627003
复制相似问题