我正在尝试获得属于两个独立类别的计数的相对比例。这是原始文件的一个示例。
A tibble: 8 x 5
resp euRefVoteW1 euRefVoteW2 euRefVoteW3 Paper
<fct> <int> <int> <int> <fct>
1 Remain 316 290 313 Times
2 Leave 157 123 159 Times
3 Will Not Vote 2 3 3 Times
4 Don't Know 56 51 55 Times
5 Remain 190 175 199 Telegraph
6 Leave 339 282 334 Telegraph
7 Will Not Vote 4 3 4 Telegraph
8 Don't Know 70 62 69 Telegraph这是两个不同因素的总和。我正在尝试将响应计数转换为百分比,因此它看起来如下所示:
A tibble: 8 x 5
resp euRefVoteW1 euRefVoteW2 euRefVoteW3 Paper
1 Remain 52% 53% .. Times
2 Leave 43% 42% .. Times
3 Will Not Vote 1% 2% . Times
4 Don't Know 4% 3% . Times
5 Remain 35% 35% . Telegraph
6 Leave 52% 52% . Telegraph
7 Will Not Vote 2% 2% . Telegraph
8 Don't Know 11% 11% . Telegraph(显然这些数字是不正确的,但我希望它表明每个4x1部分的总和应该是100%)。
数据帧的格式已经与表类似,那么有没有一种方法可以将prop.table方法应用于df?当我尝试这样做时,它拒绝了,因为df不是一个干净的数组。有什么办法可以解决这个问题吗?
for_stack <- combined_tallies %>%
group_by(Paper, resp) %>%
prop.table(margin=2)Here is an rds copy of the dataframe if this helps!
The best answers I could find elsewhere here in SO were of no use
发布于 2020-07-19 00:09:39
也许你正在寻找它
library(tidyverse)
combined_tallies %>%
group_by(Paper) %>%
mutate(across(where(is.numeric), ~ .x / sum(.x, na.rm = T) * 100))
# A tibble: 20 x 10
# Groups: Paper [5]
resp euRefVoteW1 euRefVoteW2 euRefVoteW3 euRefVoteW4 euRefVoteW6 euRefVoteW7 euRefVoteW8
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Rema~ 59.5 62.1 59.1 61.0 63.7 60.3 61.2
2 Leave 29.6 26.3 30 29.0 25.2 35.6 35.2
3 Will~ 0.377 0.642 0.566 0.565 0.377 0.377 0.377
4 Don'~ 10.5 10.9 10.4 9.42 10.7 3.77 3.20
... 发布于 2020-07-19 00:16:12
我已经使用dput()重新创建了您的数据集,您可以使用它为StackOverflow上的答案提供可重现的数据。
votes <- structure(list(resp = c("Remain", "Leave", "Will Not Vote", "Don’t Know",
"Remain", "Leave", "Will Not Vote", "Don’t Know"), ref1 = c(316,
157, 2, 56, 190, 339, 4, 70), ref2 = c(290, 123, 3, 51, 175,
282, 3, 62), ref3 = c(313, 159, 3, 55, 199, 334, 4, 69), paper = c("Times",
"Times", "Times", "Times", "Telegraph", "Telegraph", "Telegraph",
"Telegraph")), .Names = c("resp", "ref1", "ref2", "ref3", "paper"
), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))另一种方法是在执行分析之前更改数据集的结构。您正在尝试创建相对值,而不是整个列或行,而是子集。解决这个问题的一种方法是使用tidyverse包,并以该格式执行分析。一旦计算了百分比,您始终可以恢复到原始结构。
library(tidyverse)
vote_long <- votes %>%
pivot_longer(cols = c(ref1, ref2, ref3), names_to = "ref", values_to = "votes")vote_long
# A tibble: 24 x 4
resp paper ref votes
<chr> <chr> <chr> <dbl>
1 Remain Times ref1 316
2 Remain Times ref2 290
3 Remain Times ref3 313
4 Leave Times ref1 157
5 Leave Times ref2 123
6 Leave Times ref3 159
7 Will Not Vote Times ref1 2
8 Will Not Vote Times ref2 3
9 Will Not Vote Times ref3 3
10 Don’t Know Times ref1 56
# … with 14 more rows# created grouped relative values
vote_long_relative <- vote_long %>%
group_by(paper, ref) %>%
mutate(rel_votes = votes/sum(votes) * 100)
vote_wide_relative <- vote_long_relative %>%
select(-votes) %>%
pivot_wider(id_cols = c(resp, paper), names_from = "ref", values_from = "rel_votes")
vote_wide_relative# Groups: paper [2]
resp paper ref1 ref2 ref3
<chr> <chr> <dbl> <dbl> <dbl>
1 Remain Times 59.5 62.1 59.1
2 Leave Times 29.6 26.3 30
3 Will Not Vote Times 0.377 0.642 0.566
4 Don’t Know Times 10.5 10.9 10.4
5 Remain Telegraph 31.5 33.5 32.8
6 Leave Telegraph 56.2 54.0 55.1
7 Will Not Vote Telegraph 0.663 0.575 0.660
8 Don’t Know Telegraph 11.6 11.9 11.4 https://stackoverflow.com/questions/62970305
复制相似问题