我正在研究植物的基因组拓扑结构,以便了解核基因组的组织/比较时间。
我想知道一种快速的方法来计算感兴趣的特定基因组区域的基因密度,以便从区域的开始和结束绘制每个bin +/- 50Kb的每个comp的基因频率。
有没有人有办法解决这个问题?提前感谢
发布于 2020-05-04 21:03:20
如果你愿意,我已经在Biostar上发布了我所做的更多信息:
post in BioStar with figures exemple of the wanted output
所以对于基因密度,我指的是重叠在我的区域上的基因数量。但我必须用我的区域大小来归一化这个计数,因为它们的大小不一样。
这里有我的region dataframe对象:
> head(regionFileCustom)
chr startCust endCust id score strand
1 CMiso1.1chr01 -50000 3550000 Domain-1CMiso1.1chr01 1000 +
2 CMiso1.1chr01 14450000 15300000 Domain-17CMiso1.1chr01 1000 +
3 CMiso1.1chr01 15250000 15500000 Domain-18CMiso1.1chr01 1000 +
4 CMiso1.1chr01 15600000 16650000 Domain-19CMiso1.1chr01 1000 +
5 CMiso1.1chr01 16600000 17050000 Domain-20CMiso1.1chr01 1000 +
6 CMiso1.1chr01 26950000 27450000 Domain-34CMiso1.1chr01 1000 +然后是我的基因数据帧对象:
> head(features)
chr start end id score strand
1 CMiso1.1chr01 955 24594 CMiso1.1chr01g0058061 . +
2 CMiso1.1chr01 24595 25591 CMiso1.1chr01g0058071 . -
3 CMiso1.1chr01 25797 30580 CMiso1.1chr01g0058081 . +
4 CMiso1.1chr01 31006 35465 CMiso1.1chr01g0058091 . -
5 CMiso1.1chr01 41322 41398 CMiso1.1chr01g0058101 . -
6 CMiso1.1chr01 42694 46218 CMiso1.1chr01g0058111 . -因此,为了进行计数,我尝试了下面这个构建函数:
# function to compute the gene density per bin and the gene frequency (gene per bin per nbr of bin per genome region)
width = 201L
step=201L
n = 100
minoverlap=21L
maxgap=-1L
ignore.strand=TRUE # I don't know yet if I have to take in count the strand for this part of the analysis.
# For test windowBin.gr = windowBinVarName
computeFeatureFreq <- function(features, regionFile, outDFName){
# convert feature as GRange
features.gr = makeGRangesFromDataFrame(features)
# convert regionFile as Grange
regionFile.gr = makeGRangesFromDataFrame(regionFile)
# Counting gene density per genome region
# feature.density.windowBin = countOverlaps(windowBin.gr,features.gr,ignore.strand=ignore.strand, minoverlap=minoverlap, maxgap=maxgap)
# feature.density.windowBin = countOverlaps(features.gr, windowBin.gr,ignore.strand=ignore.strand, minoverlap=minoverlap, maxgap=maxgap)
feature.density.region = countOverlaps(features.gr, regionFile.gr)
# retreive coordonates for each windowBin
regionFile.df=as.data.frame(regionFile.gr)
feature.density.region.df = data.frame(chr=regionFile.df$seqnames, start=regionFile.df$start, end=regionFile.df$end, count=feature.density.region)
return(feature.density.region.df)
}如果有不清楚的地方,请告诉我,谢谢
https://stackoverflow.com/questions/61591523
复制相似问题