文章/答案/技术大牛

发布

社区首页 >问答首页 >具有两个因子的多个列的百分比

问具有两个因子的多个列的百分比
EN

Stack Overflow用户

提问于 2020-07-18 23:25:06

回答 2查看 58关注 0票数 0

我正在尝试获得属于两个独立类别的计数的相对比例。这是原始文件的一个示例。

A tibble: 8 x 5
  resp          euRefVoteW1 euRefVoteW2 euRefVoteW3 Paper    
  <fct>               <int>       <int>       <int> <fct>    
1 Remain                316         290         313 Times    
2 Leave                 157         123         159 Times    
3 Will Not Vote           2           3           3 Times    
4 Don't Know             56          51          55 Times    
5 Remain                190         175         199 Telegraph
6 Leave                 339         282         334 Telegraph
7 Will Not Vote           4           3           4 Telegraph
8 Don't Know             70          62          69 Telegraph

这是两个不同因素的总和。我正在尝试将响应计数转换为百分比，因此它看起来如下所示：

A tibble: 8 x 5
  resp          euRefVoteW1 euRefVoteW2 euRefVoteW3 Paper    
1 Remain                52%         53%        .. Times    
2 Leave                 43%         42%         .. Times    
3 Will Not Vote          1%            2%       . Times    
4 Don't Know             4%            3%       . Times    
5 Remain                35%         35%         . Telegraph
6 Leave                 52%         52%         . Telegraph
7 Will Not Vote          2%           2%           . Telegraph
8 Don't Know             11%          11%          . Telegraph

(显然这些数字是不正确的，但我希望它表明每个4x1部分的总和应该是100%)。

数据帧的格式已经与表类似，那么有没有一种方法可以将prop.table方法应用于df？当我尝试这样做时，它拒绝了，因为df不是一个干净的数组。有什么办法可以解决这个问题吗？

for_stack <- combined_tallies %>%
               group_by(Paper, resp) %>%
                prop.table(margin=2)

Here is an rds copy of the dataframe if this helps!

The best answers I could find elsewhere here in SO were of no use

tidyverse

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-07-19 00:09:39

也许你正在寻找它

library(tidyverse)
combined_tallies %>% 
  group_by(Paper) %>% 
  mutate(across(where(is.numeric), ~ .x / sum(.x, na.rm = T) * 100))

# A tibble: 20 x 10
# Groups:   Paper [5]
resp  euRefVoteW1 euRefVoteW2 euRefVoteW3 euRefVoteW4 euRefVoteW6 euRefVoteW7 euRefVoteW8
   <fct>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
 1 Rema~      59.5        62.1        59.1        61.0        63.7        60.3        61.2  
 2 Leave      29.6        26.3        30          29.0        25.2        35.6        35.2  
 3 Will~       0.377       0.642       0.566       0.565       0.377       0.377       0.377
 4 Don'~      10.5        10.9        10.4         9.42       10.7         3.77        3.20 
...

票数 2

Stack Overflow用户

发布于 2020-07-19 00:16:12

我已经使用dput()重新创建了您的数据集，您可以使用它为StackOverflow上的答案提供可重现的数据。

votes <- structure(list(resp = c("Remain", "Leave", "Will Not Vote", "Don’t Know", 
"Remain", "Leave", "Will Not Vote", "Don’t Know"), ref1 = c(316, 
157, 2, 56, 190, 339, 4, 70), ref2 = c(290, 123, 3, 51, 175, 
282, 3, 62), ref3 = c(313, 159, 3, 55, 199, 334, 4, 69), paper = c("Times", 
"Times", "Times", "Times", "Telegraph", "Telegraph", "Telegraph", 
"Telegraph")), .Names = c("resp", "ref1", "ref2", "ref3", "paper"
), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))

另一种方法是在执行分析之前更改数据集的结构。您正在尝试创建相对值，而不是整个列或行，而是子集。解决这个问题的一种方法是使用tidyverse包，并以该格式执行分析。一旦计算了百分比，您始终可以恢复到原始结构。

library(tidyverse)
vote_long <- votes %>% 
  pivot_longer(cols = c(ref1, ref2, ref3), names_to = "ref", values_to = "votes")

vote_long

# A tibble: 24 x 4
   resp          paper ref   votes
   <chr>         <chr> <chr> <dbl>
 1 Remain        Times ref1    316
 2 Remain        Times ref2    290
 3 Remain        Times ref3    313
 4 Leave         Times ref1    157
 5 Leave         Times ref2    123
 6 Leave         Times ref3    159
 7 Will Not Vote Times ref1      2
 8 Will Not Vote Times ref2      3
 9 Will Not Vote Times ref3      3
10 Don’t Know    Times ref1     56
# … with 14 more rows

# created grouped relative values 

vote_long_relative <- vote_long %>% 
  group_by(paper, ref) %>% 
  mutate(rel_votes = votes/sum(votes) * 100)

vote_wide_relative <- vote_long_relative %>% 
  select(-votes) %>% 
  pivot_wider(id_cols = c(resp, paper), names_from = "ref", values_from = "rel_votes")

vote_wide_relative

# Groups:   paper [2]
  resp          paper       ref1   ref2   ref3
  <chr>         <chr>      <dbl>  <dbl>  <dbl>
1 Remain        Times     59.5   62.1   59.1  
2 Leave         Times     29.6   26.3   30    
3 Will Not Vote Times      0.377  0.642  0.566
4 Don’t Know    Times     10.5   10.9   10.4  
5 Remain        Telegraph 31.5   33.5   32.8  
6 Leave         Telegraph 56.2   54.0   55.1  
7 Will Not Vote Telegraph  0.663  0.575  0.660
8 Don’t Know    Telegraph 11.6   11.9   11.4

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62970305

复制

相似问题

问具有两个因子的多个列的百分比
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问具有两个因子的多个列的百分比EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问具有两个因子的多个列的百分比
EN