文章/答案/技术大牛

发布

社区首页 >问答首页 >用dplyr计算两组的发生率和发生频率。

问用dplyr计算两组的发生率和发生频率。
EN

Stack Overflow用户

提问于 2020-10-28 20:05:34

回答 1查看 222关注 0票数 0

我正在学习dplyr，并从类似的帖子中寻找解决方案，但没有发现有这种组合的问题。

下面是一个数据框架示例：

set.seed(1)
    df <- data.frame(sampleID = c(rep("sample1",2),
                                 rep("sample2",3),
                                 rep("sample3",4)),
                     species = c("clover","nettle",
                                 "clover","nettle","vine",
                                 "clover","clover","nettle","vine"),
                     type = c("vegetation","seed",
                              "vegetation","vegetation","vegetation",
                              "seed","vegetation","seed","vegetation"),
                     mass = sample(1:9))

    > df
  sampleID species       type mass
1  sample1  clover vegetation    9
2  sample1  nettle       seed    4
3  sample2  clover vegetation    7
4  sample2  nettle vegetation    1
5  sample2    vine vegetation    2
6  sample3  clover       seed    6
7  sample3  clover vegetation    3
8  sample3  nettle       seed    8
9  sample3    vine vegetation    5

我需要返回一个数据框架，它计算每个独特的物种/类型组合的百分比质量，我需要在sampleID中出现物种/类型的百分比频率。

因此，在这个例子中，葡萄/植被的物种/类型的解是百分比质量=(5+2)/(sum(质量))，并且百分比频率是2/3，因为在sample1中没有出现这种组合。

首先，我尝试了不同的组合，例如：

df %>%
  group_by(species,type) %>%
  summarize(totmass = sum(mass))  %>%
  mutate(percmass = totmass/sum(totmass))

但这会给葡萄/植被100%的质量吗？另外，我也不知道从哪里得到基于sampleID的百分比频率。

dplyr

tidyverse

percentage

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-10-28 20:40:40

我不知道你说得对不对，但也许这就是你要找的：

set.seed(1)
df <- data.frame(sampleID = c(rep("sample1",2),
                              rep("sample2",3),
                              rep("sample3",4)),
                 species = c("clover","nettle",
                             "clover","nettle","vine",
                             "clover","clover","nettle","vine"),
                 type = c("vegetation","seed",
                          "vegetation","vegetation","vegetation",
                          "seed","vegetation","seed","vegetation"),
                 mass = sample(1:9))

library(dplyr)

df %>%
  # Add total mass
  add_count(wt = mass, name = "sum_mass") %>%
  # Add total number of samples
  add_count(nsamples = n_distinct(sampleID)) %>%
  # Add sum_mass and nsamples to group_by
  group_by(species, type, sum_mass, nsamples) %>%
  summarize(nsample = n_distinct(sampleID), 
            totmass = sum(mass), .groups = "drop")  %>%
  mutate(percmass = totmass / sum_mass,
         percfreq = nsample / nsamples)
#> # A tibble: 5 x 8
#>   species type       sum_mass nsamples nsample totmass percmass percfreq
#>   <chr>   <chr>         <int>    <int>   <int>   <int>    <dbl>    <dbl>
#> 1 clover  seed             45        3       1       6   0.133     0.333
#> 2 clover  vegetation       45        3       3      19   0.422     1    
#> 3 nettle  seed             45        3       2      12   0.267     0.667
#> 4 nettle  vegetation       45        3       1       1   0.0222    0.333
#> 5 vine    vegetation       45        3       2       7   0.156     0.667

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64580507

复制

相似问题

问用dplyr计算两组的发生率和发生频率。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用dplyr计算两组的发生率和发生频率。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用dplyr计算两组的发生率和发生频率。
EN