当在特定位置检测到单个动物(ID)时,我有一个时间戳数据。下面是数据的一个简短示例(为了清楚起见,按位置排序,然后按timestampUTC排序):
df <- read.table(text =
"timestampUTC location ID
'2017-10-11 04:57:16' JB12 A69-1602-2429
'2017-10-11 04:58:25' JB12 A69-1602-2429
'2017-10-11 04:59:08' JB12 A69-1602-2429
'2017-10-11 05:00:01' JB12 A69-1602-2429
'2017-10-11 05:03:04' JB12 A69-1602-2429
'2017-10-11 05:04:38' JB12 A69-1602-2429
'2017-10-11 04:56:20' JB13 A69-1601-47280
'2017-10-11 04:57:44' JB13 A69-1601-47280
'2017-10-11 04:59:00' JB13 A69-1601-47280
'2017-10-11 04:59:56' JB13 A69-1601-47280
'2017-10-11 05:01:16' JB13 A69-1601-47280
'2017-10-11 05:02:20' JB13 A69-1601-47280
'2017-10-11 05:02:56' JB13 A69-1601-47280
'2017-10-11 05:03:34' JB13 A69-1601-47280
'2017-10-11 05:08:29' JB13 A69-1602-2429
'2017-10-11 04:56:43' WBR A69-1602-2428
'2017-10-11 04:57:09' WBR A69-1602-2425
'2017-10-11 04:58:11' WBR A69-1602-2428
'2017-10-11 04:58:42' WBR A69-1602-2425
'2017-10-11 04:59:34' WBR A69-1602-2428
'2017-10-11 05:00:13' WBR A69-1602-2425
'2017-10-11 05:00:47' WBR A69-1602-2428
'2017-10-11 05:01:39' WBR A69-1602-2428
'2017-10-11 05:01:43' WBR A69-1602-2425
'2017-10-11 05:02:49' WBR A69-1602-2428
'2017-10-11 05:03:36' WBR A69-1602-2428
'2017-10-11 05:04:38' WBR A69-1602-2428
'2017-10-11 05:07:32' WBR A69-1602-2428", header = T)我想从这些数据中生成一个数据,在这些数据中,对于每个每小时的收件箱、每个位置和每个ID,我都有一个列,列上该ID在那个时刻是否存在(1)或不存在(0),以及一个列,其中包含该ID在那个时刻的观察计数。用稍微不同的措辞--对于每一个ID,我想知道它是存在(1)还是不存在(0),以及研究期间每小时在每个位置的观察数量。
例如,上面的数据有4个唯一的ID、3个独特的位置和观察结果,它们都属于两个小时(2017-10-11 04:00,2017-10-11 :00:00)。输出将是一个df,有24行(4个ID x3个位置x2个小时箱)。输出应该如下所示:
output <- read.table(text =
"HourlyBinUTC location ID count present
'2017-10-11 04:00:00' JB13 A69-1601-47280 4 1
'2017-10-11 04:00:00' JB13 A69-1602-2429 0 0
'2017-10-11 04:00:00' JB13 A69-1602-2428 0 0
'2017-10-11 04:00:00' JB13 A69-1602-2425 0 0
'2017-10-11 04:00:00' JB12 A69-1601-47280 0 0
'2017-10-11 04:00:00' JB12 A69-1602-2429 4 1
'2017-10-11 04:00:00' JB12 A69-1602-2428 0 0
'2017-10-11 04:00:00' JB12 A69-1602-2425 0 0
'2017-10-11 04:00:00' WBR A69-1601-47280 0 0
'2017-10-11 04:00:00' WBR A69-1602-2429 0 0
'2017-10-11 04:00:00' WBR A69-1602-2428 3 1
'2017-10-11 04:00:00' WBR A69-1602-2425 2 1
'2017-10-11 05:00:00' JB13 A69-1601-47280 4 1
'2017-10-11 05:00:00' JB13 A69-1602-2429 1 1
'2017-10-11 05:00:00' JB13 A69-1602-2428 0 0
'2017-10-11 05:00:00' JB13 A69-1602-2425 0 0
'2017-10-11 05:00:00' JB12 A69-1601-47280 0 0
'2017-10-11 05:00:00' JB12 A69-1602-2429 3 1
'2017-10-11 05:00:00' JB12 A69-1602-2428 0 0
'2017-10-11 05:00:00' JB12 A69-1602-2425 0 0
'2017-10-11 05:00:00' WBR A69-1601-47280 0 0
'2017-10-11 05:00:00' WBR A69-1602-2429 0 0
'2017-10-11 05:00:00' WBR A69-1602-2428 6 1
'2017-10-11 05:00:00' WBR A69-1602-2425 2 1", header = T)(在我真正的df中,它跨越每小时19,887个垃圾箱,有17个人和4个地点,我希望输出有1,352,316行)
我是一个R新手,玩过reshape()和dcast(),但我对这些论点还不太熟悉,无法得到我所需要的东西。我怀疑这将是一份工作,我(很明显)非常需要学习!
提前感谢!
发布于 2019-10-22 12:30:03
你不需要任何软件包。我在您的df中读过,但是设置了选项stringsAsFactors。
df <- read.table(text =
"timestampUTC location ID
'2017-10-11 04:57:16' JB12 A69-1602-2429
## ...
'2017-10-11 05:07:32' WBR A69-1602-2428",
header = TRUE, stringsAsFactors = FALSE)现在,告诉R第一列是时间戳,然后绕过时间戳。
df[["timestampUTC"]] <- as.POSIXct(df[["timestampUTC"]], tz = "UTC")
df[["timestampUTC"]] <- trunc(df[["timestampUTC"]], "hours")然后调用table并将结果转换为数据帧。
result <- as.data.frame(table(df))
## timestampUTC location ID Freq
## 1 2017-10-11 04:00:00 JB12 A69-1601-47280 0
## 2 2017-10-11 05:00:00 JB12 A69-1601-47280 0
## 3 2017-10-11 04:00:00 JB13 A69-1601-47280 4
## 4 2017-10-11 05:00:00 JB13 A69-1601-47280 4
## 5 2017-10-11 04:00:00 WBR A69-1601-47280 0
## 6 2017-10-11 05:00:00 WBR A69-1601-47280 0
## 7 2017-10-11 04:00:00 JB12 A69-1602-2425 0
## 8 2017-10-11 05:00:00 JB12 A69-1602-2425 0
## 9 2017-10-11 04:00:00 JB13 A69-1602-2425 0
## 10 2017-10-11 05:00:00 JB13 A69-1602-2425 0
## 11 2017-10-11 04:00:00 WBR A69-1602-2425 2
## 12 2017-10-11 05:00:00 WBR A69-1602-2425 2
## 13 2017-10-11 04:00:00 JB12 A69-1602-2428 0
## 14 2017-10-11 05:00:00 JB12 A69-1602-2428 0
## 15 2017-10-11 04:00:00 JB13 A69-1602-2428 0
## 16 2017-10-11 05:00:00 JB13 A69-1602-2428 0
## 17 2017-10-11 04:00:00 WBR A69-1602-2428 3
## 18 2017-10-11 05:00:00 WBR A69-1602-2428 6
## 19 2017-10-11 04:00:00 JB12 A69-1602-2429 3
## 20 2017-10-11 05:00:00 JB12 A69-1602-2429 3
## 21 2017-10-11 04:00:00 JB13 A69-1602-2429 0
## 22 2017-10-11 05:00:00 JB13 A69-1602-2429 1
## 23 2017-10-11 04:00:00 WBR A69-1602-2429 0
## 24 2017-10-11 05:00:00 WBR A69-1602-2429 0如果您想要包含一个从未出现在数据中的ID,那么您需要告诉table ID的值是可能的。您可以通过使ID成为一个因素来做到这一点。
df[["ID"]] <- factor(df[["ID"]],
levels = c("A69-1602-2429", "A69-1601-47280",
"A69-1602-2428", "A69-1602-2425",
"foo"))或延长工作时间:
df[["timestampUTC"]] <-
factor(df[["timestampUTC"]],
levels =
seq(from = as.POSIXct("2017-10-01 00:00:00", tz = "UTC"),
to = as.POSIXct("2017-10-03 00:00:00", tz = "UTC"),
by = "1 hour"))结果将包括零计数。
## ...
## 25 2017-10-11 04:00:00 JB12 foo 0
## 26 2017-10-11 05:00:00 JB12 foo 0
## 27 2017-10-11 04:00:00 JB13 foo 0
## 28 2017-10-11 05:00:00 JB13 foo 0
## 29 2017-10-11 04:00:00 WBR foo 0
## 30 2017-10-11 05:00:00 WBR foo 0我建议使用一个逻辑向量,而不是present的零一变量:
present <- result$Freq > 0
cbind(result, present)
## timestampUTC location ID Freq present
## 1 2017-10-11 04:00:00 JB12 A69-1601-47280 0 FALSE
## 2 2017-10-11 05:00:00 JB12 A69-1601-47280 0 FALSE
## 3 2017-10-11 04:00:00 JB13 A69-1601-47280 4 TRUE
## 4 2017-10-11 05:00:00 JB13 A69-1601-47280 4 TRUE
## 5 2017-10-11 04:00:00 WBR A69-1601-47280 0 FALSE
## ...https://stackoverflow.com/questions/58496963
复制相似问题