我有一个数据集,如下所示:
df <- tribble(
~id, ~price, ~number_of_book,
"1", 10, 3,
"1", 5, 1,
"2", 7, 4,
"2", 6, 2,
"2", 3, 4,
"3", 4, 1,
"4", 5, 1,
"4", 6, 1,
"5", 1, 2,
"5", 9, 3,
)正如您在数据集中所看到的,如果id为"1“,则有3本书每本书花费10美元,还有1本书的价格为5美元。基本上,我想看到的份额(%)的书籍数量为每个价格仓。下面是我想要的数据集:
df <- tribble(
~id, ~less_than_three, ~three-five, ~five-six, ~more_than_six,
"1", "0%", "25%", "0%", "75%",
"2", "0%", "40%", "20%", "40%",
"3", "0%", "100%", "0%", "0%",
"4", "0%", "50%", "50%", "0%",
"5", "40%", "0%", "0%", "60%",
)现在,我把价格放在第一位。为此,我运行以下代码:
out <- cut(df$price, breaks = c(0, 3, 5, 6, 10),
labels = c("<3","3-5","5-6", ">6"))
out = table(out) / sum(table(out)) 但不幸的是,由于缺乏编码知识,我无法更进一步。你能帮我得到想要的数据吗?
发布于 2019-12-03 19:52:09
使用dplyr,您可以添加列cols,它将用于列名。然后,你可以把每一本书的数量相加在每一个id中。接下来,您可以计算百分比,方法是将这些数字除以该id的和,然后应用scales::percent格式化为百分比,而不是十进制。现在您只需要给出从其中获取名称和值的变量,并重新排序列以匹配原始的标签顺序。(这比另一个答案要复杂一些,因为当给定的(id,cols/interval)对有>1行时,当看门人简化了事情时,它就说明了这种情况)。
labels = c("less_than_three","three_to_five","five_to_six", "more_than_six")
df %>%
group_by(id, cols = cut(price, breaks = c(0, 3, 5, 6, 10), labels = labels)) %>%
summarise(n = sum(number_of_book)) %>%
group_by(id) %>%
mutate(pct = scales::percent(n/sum(n), 1)) %>%
pivot_wider(id_cols = id, names_from = cols, values_from = pct) %>%
select_at(c('id', labels)) %>%
ungroup
# # A tibble: 5 x 5
# id less_than_three three_to_five five_to_six more_than_six
# <chr> <chr> <chr> <chr> <chr>
# 1 1 NA 25% NA 75%
# 2 2 40% NA 20% 40%
# 3 3 NA 100% NA NA
# 4 4 NA 50% 50% NA
# 5 5 40% NA NA 60% 如果您想用0%替换NAs (我认为这在这个上下文中是有意义的,并且与问题中显示的输出相匹配),您可以使用下面的注释中提到的方法。
df %>%
group_by(id, cols = cut(price, breaks = c(0, 3, 5, 6, 10), labels = labels)) %>%
summarise(n = sum(number_of_book)) %>%
group_by(id) %>%
mutate(pct = scales::percent(n/sum(n), 1)) %>%
pivot_wider(id_cols = id, names_from = cols, values_from = pct,
values_fill = list(pct = '0%')) %>%
select_at(c('id', labels)) %>%
ungroup
# # A tibble: 5 x 5
# id less_than_three three_to_five five_to_six more_than_six
# <chr> <chr> <chr> <chr> <chr>
# 1 1 0% 57% 0% 43%
# 2 2 40% 0% 20% 40%
# 3 3 0% 100% 0% 0%
# 4 4 0% 50% 50% 0%
# 5 5 40% 0% 0% 60% 发布于 2019-12-03 19:47:37
我们可以使用cut获取间隔,然后使用tidyr转换数据到宽格式,最后使用janitor添加百分比。
library(dplyr)
library(tidyr)
library(janitor)
df %>%
mutate(interval = cut(price, c(0,3,5,6,Inf))) %>%
select(-price) %>%
pivot_wider(names_from = interval, values_from = number_of_book) %>%
adorn_percentages()
#> id (6,Inf] (3,5] (5,6] (0,3]
#> 1 0.75 0.25 NA NA
#> 2 0.40 NA 0.2 0.4
#> 3 NA 1.00 NA NA
#> 4 NA 0.50 0.5 NA
#> 5 0.60 NA NA 0.4https://stackoverflow.com/questions/59164019
复制相似问题