我想从我的列表中获得人类基因的坐标(由hgnc基因id组成),使用GenomicFeatures和TxDb.Hstiiens.UCSC.hg19。
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb=(TxDb.Hsapiens.UCSC.hg19.knownGene)
my_genes = c("INO80","NASP","INO80D","SMARCA1")
select(txdb, keys = my_genes,
columns=c("TXCHROM","TXSTART","TXEND","TXSTRAND"),
keytype="GENEID")但是,由于txdb不使用hgnc标识符,所以它无法工作;如何解决这个问题?我找不到支持hgnc的任何适当的键类型,并且不确定如何匹配我拥有的hgnc id和txdb中的GENEID。
发布于 2018-09-10 07:25:44
因为txdb是用于转录的,它没有(hgnc) geneSymbol,但是它有EntrezID。
首先,我们需要将geneSymbol映射到EntrezID。
library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
myGeneSymbols <- select(org.Hs.eg.db,
keys = c("INO80","NASP","INO80D","SMARCA1"),
columns = c("SYMBOL","ENTREZID"),
keytype = "SYMBOL")
# SYMBOL ENTREZID
# 1 INO80 54617
# 2 NASP 4678
# 3 INO80D 54891
# 4 SMARCA1 6594然后我们可以子集txdb
myGeneSymbolsTx <- select(TxDb.Hsapiens.UCSC.hg19.knownGene,
keys = myGeneSymbols$ENTREZID,
columns = c("GENEID", "TXID", "TXCHROM", "TXSTART", "TXEND"),
keytype = "GENEID")
# GENEID TXID TXCHROM TXSTART TXEND
# 1 54617 55599 chr15 41267988 41280172
# 2 54617 55600 chr15 41271079 41408340
# 3 54617 55601 chr15 41271079 41408340
# 4 4678 1229 chr1 46049660 46079853
# 5 4678 1230 chr1 46049660 46081143
# 6 4678 1231 chr1 46049660 46084578
# 7 4678 1232 chr1 46049660 46084578
# 8 4678 1233 chr1 46049660 46084578
# 9 4678 1234 chr1 46067733 46075197
# 10 4678 1235 chr1 46077135 46084578
# 11 54891 12593 chr2 206858445 206950906
# 12 6594 77970 chrX 128580478 128657460
# 13 6594 77971 chrX 128580478 128657460
# 14 6594 77972 chrX 128580740 128657460
# 15 6594 77973 chrX 128580740 128657460如果需要,我们可以使用merge将geneSymbol添加到表中:
res <- merge(myGeneSymbols, myGeneSymbolsTx, by.x = "ENTREZID", by.y = "GENEID")
# ENTREZID SYMBOL TXID TXCHROM TXSTART TXEND
# 1 4678 NASP 1229 chr1 46049660 46079853
# 2 4678 NASP 1230 chr1 46049660 46081143
# 3 4678 NASP 1231 chr1 46049660 46084578
# 4 4678 NASP 1232 chr1 46049660 46084578
# 5 4678 NASP 1233 chr1 46049660 46084578
# 6 4678 NASP 1234 chr1 46067733 46075197
# 7 4678 NASP 1235 chr1 46077135 46084578
# 8 54617 INO80 55599 chr15 41267988 41280172
# 9 54617 INO80 55600 chr15 41271079 41408340
# 10 54617 INO80 55601 chr15 41271079 41408340
# 11 54891 INO80D 12593 chr2 206858445 206950906
# 12 6594 SMARCA1 77970 chrX 128580478 128657460
# 13 6594 SMARCA1 77971 chrX 128580478 128657460
# 14 6594 SMARCA1 77972 chrX 128580740 128657460
# 15 6594 SMARCA1 77973 chrX 128580740 128657460发布于 2018-09-07 12:14:36
我不熟悉TxDb及其接受/包含的属性类型。
不过,我可以为您提供一种使用biomaRt包的替代方法,它也接受hgnc。
library(biomaRt)
my_genes = c("INO80","NASP","INO80D","SMARCA1")
m <- useMart('ensembl', dataset='hsapiens_gene_ensembl') # create a mart object
df <- getBM(mart=m, attributes=c('hgnc_symbol', 'description', 'chromosome_name',
'start_position', 'end_position', 'strand',
'ensembl_gene_id'),
filters='hgnc_symbol', values=my_genes) # where df is a data.frame with all your requested info它有大量的属性可供选择,您可以通过简单的操作来找出这些属性:
listAttributes(m) # our current dataset有关更多信息,请查看??biomaRt
希望这能有所帮助。
https://stackoverflow.com/questions/52220415
复制相似问题