文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在R(快速版本)中对多个条件的列进行求和

问如何在R(快速版本)中对多个条件的列进行求和
EN

Stack Overflow用户

提问于 2020-10-08 15:02:02

回答 1查看 51关注 0票数 1

我对R有个问题，试图为非常大的表编写代码。

对于test_cds中的每一行和相应的位置，我都试图对test_cov表中的覆盖率$cov之和进行计算。

例如，对于test_cds第1行：

         seqid source type start end
1 NW_019942502 Gnomon  CDS     1   3

1至3之间的职位包括，使用：

> test_cov

        seqid  pos cov
1 NW_019942502   1  13
2 NW_019942502   2  16
3 NW_019942502   3  20

do : to (Cov) for pos 1,2,3 = 13+16+20，以便输出：

> test_cds

         seqid source type start end sum_coverage
1 NW_019942502 Gnomon  CDS     1   3           49

$seqid. 警告: $pos 为每个从1到+++

这是我的输入表：

> test_cov

        seqid  pos cov
1 NW_019942502   1  13
2 NW_019942502   2  16
3 NW_019942502   3  20
(...)
4 NW_019942502  13  16
5 NW_019942502  14  16
6 NW_019942502  15  18

> test_cds

         seqid source type start end
1 NW_019942502 Gnomon  CDS     1   3   
2 NW_019942502 Gnomon  CDS    13  15   
3 NW_019942502 Gnomon  CDS    17  27  
4 NW_019942503 Gnomon  CDS     1  12   
5 NW_019942503 Gnomon  CDS    67  87

和预期产出：

> test_cds

         seqid source type start end sum_coverage
1 NW_019942502 Gnomon  CDS     1   3           49
2 NW_019942502 Gnomon  CDS    13  15           50

要做到这一点，我试图使用类似dplyr的东西来替换一个会太长的for()循环：

for (i in 1:nrow(test_map)) {
  if (test_cov$seqid == test_cds$seqid & test_cov$pos >= test_cds$start & test_cov$pos <= test_cds$end) {
    test_cds$coverage <- sum(test_cov$cov)
  }
}

非常感谢！

克洛埃

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-10-08 17:07:54

下面是dplyr和tidyr的解决方案。我们在seq与start和end之间使用separate_rows来使数据更长，并创建一个可以连接到test_cov的pos列。一旦我们加入，这对group_by和summarise来说是微不足道的。

library(dplyr)
library(tidyr)

newtest_cds <- 
   test_cds %>% 
   rowwise %>% 
   mutate(pos = paste(seq(start, 
                          end), 
                      collapse = ",")) %>%
   tidyr::separate_rows(pos) %>%
   mutate(pos = as.integer(pos))

newtest_cds
#> # A tibble: 50 x 6
#>    seqid        source type  start   end   pos
#>    <chr>        <chr>  <chr> <dbl> <dbl> <int>
#>  1 NW_019942502 Gnomon CDS       1     3     1
#>  2 NW_019942502 Gnomon CDS       1     3     2
#>  3 NW_019942502 Gnomon CDS       1     3     3
#>  4 NW_019942502 Gnomon CDS      13    15    13
#>  5 NW_019942502 Gnomon CDS      13    15    14
#>  6 NW_019942502 Gnomon CDS      13    15    15
#>  7 NW_019942502 Gnomon CDS      17    27    17
#>  8 NW_019942502 Gnomon CDS      17    27    18
#>  9 NW_019942502 Gnomon CDS      17    27    19
#> 10 NW_019942502 Gnomon CDS      17    27    20
#> # … with 40 more rows

cds_cov <- right_join(newtest_cds, test_cov)
#> Joining, by = c("seqid", "pos")

cds_cov %>% 
   group_by(seqid, source, type, start, end) %>%
   summarise(sum_coverage = sum(cov))


#> # A tibble: 2 x 6
#> # Groups:   seqid, source, type, start [2]
#>   seqid        source type  start   end sum_coverage
#>   <chr>        <chr>  <chr> <dbl> <dbl>        <dbl>
#> 1 NW_019942502 Gnomon CDS       1     3           49
#> 2 NW_019942502 Gnomon CDS      13    15           50

你的样本数据

test_cds <- readr::read_table2("seqid source type start end
NW_019942502 Gnomon  CDS     1   3
NW_019942502 Gnomon  CDS    13  15
NW_019942502 Gnomon  CDS    17  27
NW_019942503 Gnomon  CDS     1  12
NW_019942503 Gnomon  CDS    67  87")

test_cov <- readr::read_table2("seqid  pos cov
NW_019942502   1  13
NW_019942502   2  16
NW_019942502   3  20
NW_019942502  13  16
NW_019942502  14  16
NW_019942502  15  18")

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64265360

复制

相似问题

问如何在R(快速版本)中对多个条件的列进行求和
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在R(快速版本)中对多个条件的列进行求和EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在R(快速版本)中对多个条件的列进行求和
EN