我有日期,我想要匹配的事件,我只有开始日期。作为一个简化的解释,说我想知道在某些活动中谁是总统,但我只有就职日期。
pres <- data.frame(pres = c("Ronald Reagan", "George H. W. Bush",
"Bill Clinton", "George W. Bush", "Barack
Obama", "Donald Trump"),
inaugdate = structure(c(4037, 6959, 8420, 11342, 14264,
17186), class = "Date"))
events <- data.frame(event = c("Challenger explosion", "Chernobyl
explosion", "Hurricane Katrina", "9-11"),
date = structure(c(5871, 5959, 13024, 11576), class = "Date"))显然,一个简单的left_join是行不通的,因为这些事件不是在就职典礼上发生的。
events %>%
left_join(pres, by = c("date" = "inaugdate"))在Excel中,vlookup用于为您提供true (匹配最近的前一个)或false (匹配精确)的选项。在潮间带里有类似的东西吗?
发布于 2019-02-04 20:06:16
可能不是最有效的,但是我们可以在sqldf中使用一个不等式联接
library(sqldf)
sqldf('select a.event, a.date, b.pres
from events a
left join pres b
on a.date >= b.inaugdate
group by a.event
having min(a.date - b.inaugdate)
order by date, event')输出:
event date pres
1 Challenger explosion 1986-01-28 Ronald Reagan
2 Chernobyl explosion 1986-04-26 Ronald Reagan
3 9-11 2001-09-11 George W. Bush
4 Hurricane Katrina 2005-08-29 George W. Bush发布于 2019-02-04 21:13:04
可能没有效率(取决于行数和列数),而是另一种解决问题的方法。
library(dplyr)
pres <- data.frame(pres = c("Ronald Reagan", "George H. W. Bush",
"Bill Clinton", "George W. Bush", "Barack Obama", "Donald Trump"),
inaugdate = structure(c(4037, 6959, 8420, 11342, 14264,
17186), class = "Date")) %>%
#lead date to get interval
mutate(enddt = lead(inaugdate, default = Sys.Date())-1)
events <- data.frame(event = c("Challenger explosion", "Chernobyl explosion", "Hurricane Katrina", "9-11"),
date = structure(c(5871, 5959, 13024, 11576), class = "Date"))
#get every combination of rows
newdf <- merge(pres,events,all = TRUE) %>%
filter(date >= inaugdate, date < enddt)https://stackoverflow.com/questions/54522162
复制相似问题