我有一个反映百分比的评级调查的数据框架。所有百分比均为dbl:
sample_size club bad(%) below_avg(%) neutral(%) good(%) very_good(%)
134 A 10 30 45 5 10
1586 B 12 30 24 4 30
588 C 43 10 17 16 14
345 B 30 51 10 5 4
2500 C 21 19 30 15 15我想要生成一个表格,显示被调查者的总百分比谁是好的,中立的和坏的,按俱乐部分组。我是否需要创建额外的列来显示每个评等的应答计数,还是有更直接的方法使用table()函数中的算术生成表?
编辑:为了清晰起见,我需要以下输出:
club bad neutral good
A 13.4 60.3 6.7
B 15.21 22.02 4.19
C 25.18 27.52 15.19这些数值计算如下:
每个俱乐部的response_count (按相应行样本大小的百分比计算)
每个俱乐部的总百分比=sum(Response_count)/sum(俱乐部的sample_size)*100
供参考:
和(Sample_size of A)= 134
和(Sample_size of B)= 1586+345= 1930
和(Sample_size of C)= 588+2500= 3088
发布于 2021-03-10 16:49:04
library(dplyr)
dat %>%
group_by(club) %>%
summarize(across(c(bad..., neutral..., good...), ~ 100*sum(.*sample_size/100)/sum(sample_size)))
# # A tibble: 3 x 4
# club bad... neutral... good...
# * <chr> <dbl> <dbl> <dbl>
# 1 A 10 45 5
# 2 B 15.2 21.5 4.18
# 3 C 25.2 27.5 15.2 因为您说您在结果中获得了NA,这意味着这里的示例数据没有代表性(而且您没有提到它)。您应该能够将na.rm=TRUE添加到sum中来修复这个问题。作为一次穿行:
dat$bad...[3] <- NA
dat %>%
group_by(club) %>%
summarize(across(c(bad..., neutral..., good...), ~ 100*sum(.*sample_size/100)/sum(sample_size)))
# # A tibble: 3 x 4
# club bad... neutral... good...
# * <chr> <dbl> <dbl> <dbl>
# 1 A 10 45 5
# 2 B 15.2 21.5 4.18
# 3 C NA 27.5 15.2 dat %>%
group_by(club) %>%
summarize(across(c(bad..., neutral..., good...), ~ 100*sum(.*sample_size/100, na.rm = TRUE)/sum(sample_size, na.rm = TRUE)))
# # A tibble: 3 x 4
# club bad... neutral... good...
# * <chr> <dbl> <dbl> <dbl>
# 1 A 10 45 5
# 2 B 15.2 21.5 4.18
# 3 C 17.0 27.5 15.2 (当然,这个"C"不好的评级是错误的,因为我删除了好的数据.这只是例子的一部分。试着用你的真实数据。)
数据
dat <- structure(list(sample_size = c(134L, 1586L, 588L, 345L, 2500L), club = c("A", "B", "C", "B", "C"), bad... = c(10, 12, 43, 30, 21), below_avg... = c(30L, 30L, 10L, 51L, 19L), neutral... = c(45L, 24L, 17L, 10L, 30L), good... = c(5L, 4L, 16L, 5L, 15L), very_good... = c(10L, 30L, 14L, 4L, 15L)), row.names = c(NA, -5L), class = "data.frame")发布于 2021-03-10 16:39:19
library(tidyverse)
tribble(
~sample_size, ~club, ~bad, ~below_avg, ~neutral, ~good, ~very_good,
134, "A", 10, 30, 45, 5, 10,
1586, "B", 12, 30, 24, 4, 30,
588, "C", 43, 10, 17, 16, 14,
345, "B", 30, 51, 10, 5, 4,
2500, "C", 21, 19, 30, 15, 15) %>%
group_by(club) %>%
summarise(total_percent = sum(bad, neutral, good))
# Output
# A tibble: 3 x 2
club total_percent
<chr> <dbl>
1 A 60
2 B 85
3 C 142或者:
library(tidyverse)
tribble(
~sample_size, ~club, ~bad, ~below_avg, ~neutral, ~good, ~very_good,
134, "A", 10, 30, 45, 5, 10,
1586, "B", 12, 30, 24, 4, 30,
588, "C", 43, 10, 17, 16, 14,
345, "B", 30, 51, 10, 5, 4,
2500, "C", 21, 19, 30, 15, 15) %>%
group_by(club) %>%
summarise(across(where(is.numeric), sum)) %>%
select(-below_avg, -very_good)
# Output
# A tibble: 3 x 5
club sample_size bad neutral good
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 134 10 45 5
2 B 1931 42 34 9
3 C 3088 64 47 31https://stackoverflow.com/questions/66568682
复制相似问题