嗨,我需要找到副本,我已经附上了一个数据集的图像和一个副本的例子。相同的id,以及与前面日期相同的结果。
任何帮助都将不胜感激。
数据集屏幕抓取

structure(list(id = c(1010001, 1010001, 1010001, 1010001, 1010001,
1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 1010001,
1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 1010001,
1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 1010001,
1010001, 1010001, 1010001), DateCollected = structure(c(1145664000,
1145750400, 1145836800, 1145923200, 1146009600, 1146096000, 1146096000,
1146096000, 1146096000, 1146096000, 1146096000, 1146182400, 1146268800,
1146355200, 1146441600, 1146528000, 1146614400, 1146700800, 1146787200,
1146787200, 1146787200, 1146787200, 1146787200, 1146787200, 1146873600,
1146960000, 1147046400, 1147132800, 1147219200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Test = c("Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)"
), Result = c(3, 4.1, 5.9, 8.1, 4.6, 7, 7.8, 11.2, 18.1, 18.4,
27, 4, 7.8, 8.4, 8.4, 6.1, 6.8, 5.4, 5.4, 6.5, 6.7, 8.1, 14.2,
32.4, 7.2, 8.6, 8.9, 7.2, 9.6), Units = c("ug/L", "ug/L", "ug/L",
"ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L",
"ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L",
"ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L",
"ug/L", "ug/L")), row.names = c(NA, -29L), class = c("tbl_df",
"tbl", "data.frame"))发布于 2020-04-28 12:08:26
我们可以编写一个函数来计算Result的值与找到重复项时返回的行索引值之间的差异。
find_duplicates <- function(x) {
inds <- which(diff(x) == 0)
sort(unique(c(inds, inds + 1)))
}我们可以按组应用此函数。
要获得重复的行,我们可以这样做:
library(dplyr)
df %>% group_by(id) %>% slice(find_duplicates(Result))
# id DateCollected Test Result Units
# <dbl> <dttm> <chr> <dbl> <chr>
#1 1010001 2006-04-30 00:00:00 Tacrolimus (FK506) 8.4 ug/L
#2 1010001 2006-05-01 00:00:00 Tacrolimus (FK506) 8.4 ug/L
#3 1010001 2006-05-04 00:00:00 Tacrolimus (FK506) 5.4 ug/L
#4 1010001 2006-05-05 00:00:00 Tacrolimus (FK506) 5.4 ug/L 要获得额外的标志列,我们可以使用:
df %>%
group_by(id) %>%
mutate(is_duplicate = row_number() %in% find_duplicates(Result))发布于 2020-04-28 07:23:33
我们可以按“id”分组,并通过检查相邻“lag”的结果或lead来创建一个标志。
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(flag= Result == lag(Result)|Result == lead(Result)) %>%
filter(flag)
# A tibble: 4 x 6
# Groups: id [1]
# id DateCollected Test Result Units flag
# <dbl> <dttm> <chr> <dbl> <chr> <lgl>
#1 1010001 2006-04-30 00:00:00 Tacrolimus (FK506) 8.4 ug/L TRUE
#2 1010001 2006-05-01 00:00:00 Tacrolimus (FK506) 8.4 ug/L TRUE
#3 1010001 2006-05-04 00:00:00 Tacrolimus (FK506) 5.4 ug/L TRUE
#4 1010001 2006-05-05 00:00:00 Tacrolimus (FK506) 5.4 ug/L TRUE https://stackoverflow.com/questions/61470052
复制相似问题