我有一个很大的数据框架,叫做1004 490个数据,我想分析一下治疗的成功。
ID POSITIONS TREATMENT
1 0 A
1 1 A
1 2 B
2 0 C
2 1 D
3 0 B
3 1 B
3 2 C
3 3 A
3 4 A
3 5 B因此,首先,我想计算一种治疗应用于病人( iD )的时间,但一种治疗可以多次应用于iD。那么,我需要首先删除所有的重复和计数后,还是有一个函数没有考虑到所有的重复。
What I want to have :
A : 2
B : 2
C : 2
D : 1然后,我想知道在最后一个位置治疗了多少次,但是最后一个位置根据ID总是不同的。
What I want to have :
A : 0
B : 2 (for ID = 1 and 3)
C : 0
D : 1 (for ID = 1)谢谢你的帮助,我是一个R的新用户!
发布于 2017-08-07 10:11:18
用R基,我们可以做到,
merge(aggregate(ID ~ TREATMENT, df, FUN = function(i) length(unique(i))),
aggregate(ID ~ TREATMENT, df[!duplicated(df$ID, fromLast = TRUE),], toString),
by = 'TREATMENT', all = TRUE)这给了,
治疗ID.x ID.y _1 A_2_2 B_2 1,3_3 C_2_4 D_1_2
发布于 2017-08-07 10:16:02
下面是一种tidyverse方法,根据“ID”、“处理”和“处理”的count获取distinct行
library(tidyverse)
df1 %>%
distinct(ID, TREATMENT) %>%
count(TREATMENT)
# A tibble: 4 x 2
# TREATMENT n
# <chr> <int>
#1 A 2
#2 B 2
#3 C 2
#4 D 1对于第二个输出,在按'ID‘分组后,slice最后一行(n()),创建一个列'ind’和fill,对于所有缺少的“处理”和complete组合,该列为0,然后在按‘处理’分组后获得'ind‘的sum。
df1 %>%
group_by(ID) %>%
slice(n()) %>%
mutate(ind = 1) %>%
complete(TREATMENT = unique(df1$TREATMENT), fill = list(ind=0)) %>%
group_by(TREATMENT) %>%
summarise(n = sum(ind))
# A tibble: 4 x 2
# TREATMENT n
# <chr> <dbl>
#1 A 0
#2 B 2
#3 C 0
#4 D 1数据
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L), POSITIONS = c(0L, 1L, 2L, 0L, 1L, 0L, 1L, 2L, 3L, 4L, 5L
), TREATMENT = c("A", "A", "B", "C", "D", "B", "B", "C", "A",
"A", "B")), .Names = c("ID", "POSITIONS", "TREATMENT"),
class = "data.frame", row.names = c(NA, -11L))https://stackoverflow.com/questions/45544035
复制相似问题