我看过很多线程,但找不到我要找的东西。我有一个具有多个I和日期的数据集,如下所示。
id date code
1 2000-10-08 690
1 2000-10-08 75
1 2000-10-08 35
1 2001-01-01 315
1 2001-01-01 70
1 2008-09-05 690
1 2008-09-05 5
1 2008-09-05 60
2 2006-02-01 188
2 2006-02-01 198
2 2006-02-01 555
2 2006-02-01 690
3 2010-10-10 120
3 2010-10-10 75
3 2010-10-10 25我不希望每个id重复日期,并且希望根据最低的代码值来选择这个日期,所以它将以如下方式结束:
id date code
1 2000-10-08 35
1 2001-01-01 70
1 2008-09-05 5
2 2006-02-01 188
3 2010-10-10 25我使用了group_by函数,以便它按id和date处理数据:
df %>%
group_by(id, date) %>%
arrange(code)但是,我很难确定要使用什么代码,以便现在只保留每个id/date组合的最低值。
有人能帮我吗?
谢谢
发布于 2019-01-04 18:30:00
使用tidyverse,您可以做到:
library(tidyverse)
dt %>%
group_by(id, date) %>%
summarise(code = min(code))
id date x
1 1 2000-10-08 35
2 1 2001-01-01 70
3 1 2008-09-05 5
4 2 2006-02-01 188
5 3 2010-10-10 25发布于 2019-01-04 18:20:08
data.table方法
library( data.table )
dt[, .( min = min( code ) ), by = .( id, date )]
# id date min
# 1: 1 2000-10-08 35
# 2: 1 2001-01-01 70
# 3: 1 2008-09-05 5
# 4: 2 2006-02-01 188
# 5: 3 2010-10-10 25样本数据
dt <- fread("id date code
1 2000-10-08 690
1 2000-10-08 75
1 2000-10-08 35
1 2001-01-01 315
1 2001-01-01 70
1 2008-09-05 690
1 2008-09-05 5
1 2008-09-05 60
2 2006-02-01 188
2 2006-02-01 198
2 2006-02-01 555
2 2006-02-01 690
3 2010-10-10 120
3 2010-10-10 75
3 2010-10-10 25", header = TRUE)发布于 2019-01-04 18:46:07
从tidyverse的角度来看,如果有许多列,并且需要基于每个组的minimum值'code‘的行,则最好是具有逻辑条件的filter
library(tidyverse)
df %>%
group_by(id, date) %>%
filter(code == min(code))
# A tibble: 5 x 3
# Groups: id, date [5]
# id date code
# <int> <chr> <int>
#1 1 2000-10-08 35
#2 1 2001-01-01 70
#3 1 2008-09-05 5
#4 2 2006-02-01 188
#5 3 2010-10-10 25如果每个组的min值都有关联,并且只希望第一次出现min‘代码’
df %>%
group_by(id, date) %>%
slice(which.min(code))或者另一个选择是使用top_n
df %>%
group_by(id, date) %>%
top_n(1, -code)使用来自aggregate的base R
aggregate(code ~ id + date, df, min)
# id date code
#1 1 2000-10-08 35
#2 1 2001-01-01 70
#3 2 2006-02-01 188
#4 1 2008-09-05 5
#5 3 2010-10-10 25数据
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L), date = c("2000-10-08", "2000-10-08", "2000-10-08",
"2001-01-01", "2001-01-01", "2008-09-05", "2008-09-05", "2008-09-05",
"2006-02-01", "2006-02-01", "2006-02-01", "2006-02-01", "2010-10-10",
"2010-10-10", "2010-10-10"), code = c(690L, 75L, 35L, 315L, 70L,
690L, 5L, 60L, 188L, 198L, 555L, 690L, 120L, 75L, 25L)),
class = "data.frame", row.names = c(NA, -15L))https://stackoverflow.com/questions/54044120
复制相似问题