我试图将数据分割成5秒的时间间隔,并使用dplyr对它们进行分组。
下面是我的原始数据--我有日期和时间在单独的列中,我后来使用po-6 in组合了这些列。
structure(list(Date = c("10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013"), Time = c("20:06:57", "20:07:13", "20:07:25", "20:07:30", "20:08:16", "20:08:17", "20:08:26", "20:09:05", "20:09:06", "20:09:07", "20:09:37", "20:09:38", "20:09:55", "20:12:34", "20:14:15"), ID = c("M1", "M1", "M1", "M3", "M1", "M1", "M8", "M9", "M9", "M9", "M1", "M1", "M1", "M5", "M1")), .Names = c("Date", "Time", "ID"), class = "data.frame", row.names = c(NA, -15L))
附加下面的代码
data$datetime <- as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%Y %H:%M:%S")
data_order <- data %>% arrange(datetime,ID)
data_order$group <- data_order %>% group_by(by5sec=cut(datetime, breaks= "5 secs",right =T),ID) %>% group_indices() 虽然有些观察是正确的,但有些是错误的。我尝试了两个版本-删除"right=T“并保留它,我得到了不同的组,但在这两个版本中都有错误。我也曾尝试过使用as.numeric,as.posixct等所有在切之前都是徒劳的。
附加两个versions.Red的输出被错误地编码为两个不同的组。
*版本1 "right = T"表示剪切*

*版本2 "right = F"表示剪切*

如果有人能帮我解决这个问题,我已经花了相当长的时间,鉴于我对R的了解,这简直是白费力气。我想要的是为同一个ID提供5秒的休息时间(组应该更改一个新的ID)。
期望输出

发布于 2018-03-09 12:35:34
我不太清楚你显示的输出图像。根据你的问题描述,这样的事情怎么样?
library(tidyverse);
df %>%
unite(datetime, 1:2, sep = " ", remove = FALSE) %>%
mutate(
datetime = as.POSIXct(datetime, format = "%m/%d/%Y %H:%M:%S"),
datetime.by5sec = as.numeric(cut(datetime, "sec")) %/% 5 + 1);
# datetime Date Time ID datetime.by5sec
#1 2013-10-30 20:06:57 10/30/2013 20:06:57 M1 1
#2 2013-10-30 20:07:13 10/30/2013 20:07:13 M1 4
#3 2013-10-30 20:07:25 10/30/2013 20:07:25 M1 6
#4 2013-10-30 20:07:30 10/30/2013 20:07:30 M3 7
#5 2013-10-30 20:08:16 10/30/2013 20:08:16 M1 17
#6 2013-10-30 20:08:17 10/30/2013 20:08:17 M1 17
#7 2013-10-30 20:08:26 10/30/2013 20:08:26 M8 19
#8 2013-10-30 20:09:05 10/30/2013 20:09:05 M9 26
#9 2013-10-30 20:09:06 10/30/2013 20:09:06 M9 27
#10 2013-10-30 20:09:07 10/30/2013 20:09:07 M9 27
#11 2013-10-30 20:09:37 10/30/2013 20:09:37 M1 33
#12 2013-10-30 20:09:38 10/30/2013 20:09:38 M1 33
#13 2013-10-30 20:09:55 10/30/2013 20:09:55 M1 36
#14 2013-10-30 20:12:34 10/30/2013 20:12:34 M5 68
#15 2013-10-30 20:14:15 10/30/2013 20:14:15 M1 88说明:datetime.by5sec给出了datetime落入的5秒bin索引。第一个条目位于bin 1中,第二个条目位于第4个5秒以内,即从第一个条目开始的20秒内,依此类推。这里我使用了整数除法%/% 5,因为cut.POSIXct只允许您按秒作为区间进行装箱。
更新
以下是您的预期输出:
df %>%
unite(datetime, 1:2, sep = " ", remove = FALSE) %>%
group_by(ID) %>%
mutate(
datetime = as.POSIXct(datetime, format = "%m/%d/%Y %H:%M:%S"),
difftime = difftime(datetime, lag(datetime, default = 0))) %>%
ungroup() %>%
mutate(
group = cumsum(abs(difftime) >= 5)) %>%
select(Date, Time, ID, datetime, group);
## A tibble: 15 x 5
# Date Time ID datetime group
# <chr> <chr> <chr> <dttm> <int>
# 1 10/30/2013 20:06:57 M1 2013-10-30 20:06:57 1
# 2 10/30/2013 20:07:13 M1 2013-10-30 20:07:13 2
# 3 10/30/2013 20:07:25 M1 2013-10-30 20:07:25 3
# 4 10/30/2013 20:07:30 M3 2013-10-30 20:07:30 4
# 5 10/30/2013 20:08:16 M1 2013-10-30 20:08:16 5
# 6 10/30/2013 20:08:17 M1 2013-10-30 20:08:17 5
# 7 10/30/2013 20:08:26 M8 2013-10-30 20:08:26 6
# 8 10/30/2013 20:09:05 M9 2013-10-30 20:09:05 7
# 9 10/30/2013 20:09:06 M9 2013-10-30 20:09:06 7
#10 10/30/2013 20:09:07 M9 2013-10-30 20:09:07 7
#11 10/30/2013 20:09:37 M1 2013-10-30 20:09:37 8
#12 10/30/2013 20:09:38 M1 2013-10-30 20:09:38 8
#13 10/30/2013 20:09:55 M1 2013-10-30 20:09:55 9
#14 10/30/2013 20:12:34 M5 2013-10-30 20:12:34 10
#15 10/30/2013 20:14:15 M1 2013-10-30 20:14:15 11说明:计算两个连续的datetime条目之间的时间差,按ID分组;group则是所有时间差>=5的累积和。
https://stackoverflow.com/questions/49193380
复制相似问题