我到处搜索,但是如果变量是一个因子,而不是使用group_by的整数,我似乎无法理解如何获得多个项的汇总计数。我肯定我错过了一个简单的窍门。
与同一个病人相关的多个时间段是很常见的,为了保持数据的整洁,一些变量(如性别)不会改变,而是在每个时间段重复。
示例:
df <- tibble(patient_id = rep(1:4, each = 3),
time_period = as_factor(rep(c("0 weeks", "6 weeks", "12 weeks"), times = 4)),
gender = as_factor(rep(c("female", "male"), each = 3, times = 2)))这给出了以下tibble:
# A tibble: 12 × 3
patient_id time_period gender
<int> <fct> <fct>
1 1 0 weeks female
2 1 6 weeks female
3 1 12 weeks female
4 2 0 weeks male
5 2 6 weeks male
6 2 12 weeks male
7 3 0 weeks female
8 3 6 weeks female
9 3 12 weeks female
10 4 0 weeks male
11 4 6 weeks male
12 4 12 weeks male 尝试以下几点:
df %>%
select(!time_period) %>%
group_by(patient_id) %>%
count(gender)只给出:
# A tibble: 4 × 3
# Groups: patient_id [4]
patient_id gender n
<int> <fct> <int>
1 1 female 3
2 2 male 3
3 3 female 3
4 4 male 3而我正在寻找的是女性和男性患者的总人数,一旦重复的时间周期下降到一个单一的水平,即2名女性和2名男性总体上。
发布于 2022-02-20 20:57:19
df %>% distinct(patient_id, gender) %>% count(gender)
# A tibble: 2 x 2
gender n
<fct> <int>
1 female 2
2 male 2https://stackoverflow.com/questions/71198651
复制相似问题