要使用的数据集:
df <- tibble::tribble(~person, ~age, ~height,
"John", 1, 20,
"Mike", 3, 50,
"Maria", 3, 52,
"Elena", 6, 90,
"Biden", 9, 120)我正在尝试获得一个具有以下结构的data frame:
age | height(cm) | number of people
0-5 | 0-50 | 2
0-5 | 50-100 | 1
0-5 | 100-200 | 0
5-10 | 0-50 | 0
5-10 | 50-100 | 1
5-10 | 100-200 | 1基本上,我有一个数据集,其中包含关于特定数量的人的大量信息。我想首先根据他们的年龄对其进行分类,并在每个年龄组中有一个身高组,最后是属于这些类别的人数。
有什么建议吗?
发布于 2020-11-11 00:38:36
您可以使用cut()从连续变量生成bin,然后总结新的类别。
library(dplyr)
df %>%
mutate(
age_c = cut(
age,
breaks = c(-Inf, 5, 10),
labels = c("0-5", "5-10"),
right = TRUE
),
height_c = cut(
height,
breaks = c(-Inf, 50, 100, 200),
labels = c("0-50", "50-100", "100-200"),
right = TRUE
)
) %>%
count(age_c, height_c, .drop = FALSE)
# A tibble: 6 x 3
age_c height_c n
<fct> <fct> <int>
1 0-5 0-50 2
2 0-5 50-100 1
3 0-5 100-200 0
4 5-10 0-50 0
5 5-10 50-100 1
6 5-10 100-200 1发布于 2020-11-11 00:41:45
在base R中,你可以这样做:
data.frame(with(df, table(age=cut(age, c(0,5,10)), height=cut(height, c(0,50,100,200)))))
age height Freq
1 (0,5] (0,50] 2
2 (5,10] (0,50] 0
3 (0,5] (50,100] 1
4 (5,10] (50,100] 1
5 (0,5] (100,200] 0
6 (5,10] (100,200] 1https://stackoverflow.com/questions/64772711
复制相似问题