我有一个包含每个碱基的基因组覆盖率的数据框架。下面是一个小得多的示例版本:
> head(per_base_cov)
contig_id position coverage
1 contig_1 1 40
2 contig_1 2 33
3 contig_1 3 40
4 contig_1 4 32
5 contig_1 5 36
6 contig_1 6 30
7 contig_1 7 40
8 contig_1 8 38
9 contig_1 9 36
10 contig_1 10 40
11 contig_2 11 38
12 contig_2 12 39
13 contig_2 13 34
14 contig_2 14 39
15 contig_2 15 39
16 contig_2 16 32
17 contig_2 17 30
18 contig_2 18 37
19 contig_2 19 33
20 contig_2 20 35我想计算每个重叠群的滑动窗口均值,每4个位置和重叠由2个位置。我已经使用dplyr和zoo尝试了以下方法:
per_base_cov %>%
group_by(contig_id) %>%
mutate(cov.win.mean=rollapply(coverage,4,mean,by=2))但我得到了错误消息:
Error: Problem with `mutate()` input `cov.win.mean`.
x Input `cov.win.mean` can't be recycled to size 10.
ℹ Input `cov.win.mean` is `rollapply(coverage, 4, mean, by = 2)`.
ℹ Input `cov.win.mean` must be size 10 or 1, not 4.
ℹ The error occurred in group 1: contig_id = "contig_1".有人知道我怎么解决这个问题吗?我想要一个如下所示的输出:
contig_id mean_coverage
1 contig_1 36.25
2 contig_1 34.50
3 contig_1 36.00
4 contig_1 38.50
5 contig_2 37.5
6 contig_2 36
7 contig_2 34.5
8 contig_2 33.75在此之前,非常感谢您。
发布于 2021-03-02 20:48:54
在罗纳克的帮助下,我设法找到了一个解决方案:
win_means <- per_base_cov %>%
group_by(contig_id) %>%
mutate(cov.win.mean=rollapply(coverage,4,mean,by=2, fill=NA))
win_means_complete <- win_means[complete.cases(win_means), ]
win_means_final <- win_means_complete[,c(1,2,4)]
win_means_final <- as.data.frame(win_means_final)
head(win_means_final)
contig_id position cov.win.mean
1 contig_1 2 36.25
2 contig_1 4 34.50
3 contig_1 6 36.00
4 contig_1 8 38.50
5 contig_2 12 37.50
6 contig_2 14 36.00https://stackoverflow.com/questions/66438862
复制相似问题