我正在处理一个大型数据集,其中包含每周的出行行为数据。在一周的过程中,人们已经完成了一周内进行的个人旅行的日志。个人由唯一的标识号(ID)标识。我想做的是从每个唯一ID可用的每周数据中随机选择两天的日记数据(可能包括一次或多次旅行),并将其放入新的数据框中。下面详细介绍了一个示例数据帧:
Df1 <- data.frame(ID = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3),
date = c("1st Nov", "1st Nov", "3rd Nov", "4th Nov","4th Nov","5th Nov","2nd Nov", "2nd Nov", "3nd Nov", "4th Nov","5th Nov","5th Nov","2nd Nov", "2nd Nov", "3nd Nov", "4th Nov","5th Nov"))我们将非常感谢您在上述方面提供的任何帮助。
非常感谢,
凯蒂
发布于 2011-12-07 19:35:20
听起来像是个适合plyr的工作。为每个用户随机抽取两天的样本:
library(plyr)
ddply(Df1, .(ID), function(x) {
unique_days = as.character(unique(x$date))
if(length(unique_days) < 2) {
randomSelDays = unique_days
} else {
randomSelDays = sample(unique_days, 2)
}
return(x[x$date %in% randomSelDays,])
})这将为每个唯一标识符返回两个选定日期的所有数据。此外,如果ID只有一天,则返回该天。例如:
ID date
1 1 1st Nov
2 1 1st Nov
3 1 3rd Nov
4 2 3nd Nov
5 2 5th Nov
6 2 5th Nov
7 3 2nd Nov
8 3 2nd Nov
9 3 3nd Novhttps://stackoverflow.com/questions/8414484
复制相似问题