R 将日期时间间隔分配给日期时间事件
我有一个大于1M的个人日期时间检测间隔表(即在此期间连续检测到一个人)和一个表示闸门关闭时间的日期时间间隔表。单个检测分为“进入”或“退出” 使用%内的%我可以确定检测是否在任何间隔内 然而,我想做一些不同的事情。对于每个闸门关闭间隔,我想知道有多少人在外面被检测到,有多少人在里面被检测到。我相信最简单的方法是将每个检测事件分配给一个门周期,但是如果没有非常混乱的嵌套ifelse语句,我无法理解如何编写该函数 预期输出(非真实数据): 闸门关闭间隔R 将日期时间间隔分配给日期时间事件,r,date,intervals,lubridate,R,Date,Intervals,Lubridate,我有一个大于1M的个人日期时间检测间隔表(即在此期间连续检测到一个人)和一个表示闸门关闭时间的日期时间间隔表。单个检测分为“进入”或“退出” 使用%内的%我可以确定检测是否在任何间隔内 然而,我想做一些不同的事情。对于每个闸门关闭间隔,我想知道有多少人在外面被检测到,有多少人在里面被检测到。我相信最简单的方法是将每个检测事件分配给一个门周期,但是如果没有非常混乱的嵌套ifelse语句,我无法理解如何编写该函数 预期输出(非真实数据): 闸门关闭间隔 new("Interval", .Data =
new("Interval", .Data = c(-81060, -117060, -59400, -16200, -76680,
-51000, -81120), start = structure(c(1412238660, 1412362800,
1412434800, 1412454600, 1412542980, 1412602200, 1412690400), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), tzone = "UTC")
因此,您的示例数据存在一些问题,使得回答您的问题变得困难
library(tidyverse) # needed because your data is in tibble format
library(lubridate) # needed for time interval class used in your example
# I called your example data "Detections" and "Intervals", not importing them here
set.seed(914)
row.sample <- c(1,5,2, sample(1:100, 10)) # to include the samples not in the given intervals mixed with good values
use.cols <- c(1,4,6,7,9)
Detections.use <- Detections[row.sample, use.cols]
Detections.use
# A tibble: 13 x 5
Tag site species StartDateTime_UTC EndDateTime_UTC
<fct> <fct> <chr> <dttm> <dttm>
1 5004.24 IC1 Striped Bass 2014-09-29 22:40:40 2014-09-29 22:46:35
2 5004.24 RGD1 Striped Bass 2014-10-02 00:15:47 2014-10-02 00:30:08
3 5004.24 IC1 Striped Bass 2014-09-29 22:49:15 2014-09-29 22:50:05
4 5004.24 RGD1 Striped Bass 2014-10-04 18:38:40 2014-10-04 18:52:30
5 5004.24 RGD1 Striped Bass 2014-10-05 07:15:43 2014-10-05 08:05:47
6 5004.24 RGD1 Striped Bass 2014-10-03 02:21:36 2014-10-03 02:24:01
7 5004.24 RGD1 Striped Bass 2014-10-03 12:13:46 2014-10-03 12:18:02
8 5004.24 RGD1 Striped Bass 2014-10-03 09:30:15 2014-10-03 10:14:19
9 5004.24 RGD1 Striped Bass 2014-10-04 14:09:39 2014-10-04 14:09:45
10 5004.24 RGD1 Striped Bass 2014-10-03 02:17:05 2014-10-03 02:18:05
11 5004.24 RGD1 Striped Bass 2014-10-05 03:08:26 2014-10-05 03:12:31
12 5004.24 RGD1 Striped Bass 2014-10-03 01:42:02 2014-10-03 01:58:18
13 5004.24 RGD1 Striped Bass 2014-10-02 00:33:12 2014-10-02 01:10:21
# Convert your intervals to a table, then correct them to make them positive time intevals
Interval.table <- tibble(Intervals, "ID" = seq(from = 1, to = length(Intervals))) %>%
mutate(end_time = Intervals$start,
start_time = end_time+Intervals$.Data,
new_Interval = start_time %--% end_time) %>%
select(ID, new_Interval)
Interval.table
# A tibble: 7 x 2
ID new_Interval
<int> <Interval>
1 1 2014-10-01 10:00:00 UTC--2014-10-02 08:31:00 UTC
2 2 2014-10-02 10:29:00 UTC--2014-10-03 19:00:00 UTC
3 3 2014-10-03 22:30:00 UTC--2014-10-04 15:00:00 UTC
4 4 2014-10-04 16:00:00 UTC--2014-10-04 20:30:00 UTC
5 5 2014-10-04 23:45:00 UTC--2014-10-05 21:03:00 UTC
6 6 2014-10-05 23:20:00 UTC--2014-10-06 13:30:00 UTC
7 7 2014-10-06 15:28:00 UTC--2014-10-07 14:00:00 UTC
编辑以添加sapply
解决方案---
这也可以通过稍微平滑一点的sapply
来完成,并且不会留下瑕疵:
which.join <- function(x, y) {
z <- which(x %within% y)
z <- ifelse(isTRUE(z > 0), z, NA)
z
}
Detections.use$Interval <- sapply(Detections.use$StartDateTime_UTC,
function(x) which.join(x,Interval.table$new_Interval)
which.join您能否提供一个示例,说明您正在寻找的输出内容(可以是虚拟数据),但格式可能会有所帮助。运行示例数据代码时,我还收到一个错误“getClass(Class,其中=topenv(parent.frame())):“Period”不是已定义的类”。也许可以将其剥离,使其更加精简—听起来您可以使用数据库连接方法将观测数据与间隔数据连接起来。这假设它们在两个单独的表中。@BrianFisher我已经更新了问题并修复了大门关闭期间的dput。我认为您应该能够使用类似的东西(Detections$StartDateTime\u UTC%in%Interval)
来获取索引,然后您可以将其转换为ID。不过,我无法测试它,因为您提供的两组数据没有重叠(间隔时间都在2013年3月/4月,而检测时间都在2014年9月/10月)@BrianFisher这就是我尝试对数据进行子集的原因。不幸的是,它只返回整数0。我将尝试为该个体获得正确的间隔这与我得出的答案非常接近:for(I in 1:length(dat_overlap$StartDateTime_UTC)){tryCatch({dat_overlap$interval.id[I]
library(tidyverse) # needed because your data is in tibble format
library(lubridate) # needed for time interval class used in your example
# I called your example data "Detections" and "Intervals", not importing them here
set.seed(914)
row.sample <- c(1,5,2, sample(1:100, 10)) # to include the samples not in the given intervals mixed with good values
use.cols <- c(1,4,6,7,9)
Detections.use <- Detections[row.sample, use.cols]
Detections.use
# A tibble: 13 x 5
Tag site species StartDateTime_UTC EndDateTime_UTC
<fct> <fct> <chr> <dttm> <dttm>
1 5004.24 IC1 Striped Bass 2014-09-29 22:40:40 2014-09-29 22:46:35
2 5004.24 RGD1 Striped Bass 2014-10-02 00:15:47 2014-10-02 00:30:08
3 5004.24 IC1 Striped Bass 2014-09-29 22:49:15 2014-09-29 22:50:05
4 5004.24 RGD1 Striped Bass 2014-10-04 18:38:40 2014-10-04 18:52:30
5 5004.24 RGD1 Striped Bass 2014-10-05 07:15:43 2014-10-05 08:05:47
6 5004.24 RGD1 Striped Bass 2014-10-03 02:21:36 2014-10-03 02:24:01
7 5004.24 RGD1 Striped Bass 2014-10-03 12:13:46 2014-10-03 12:18:02
8 5004.24 RGD1 Striped Bass 2014-10-03 09:30:15 2014-10-03 10:14:19
9 5004.24 RGD1 Striped Bass 2014-10-04 14:09:39 2014-10-04 14:09:45
10 5004.24 RGD1 Striped Bass 2014-10-03 02:17:05 2014-10-03 02:18:05
11 5004.24 RGD1 Striped Bass 2014-10-05 03:08:26 2014-10-05 03:12:31
12 5004.24 RGD1 Striped Bass 2014-10-03 01:42:02 2014-10-03 01:58:18
13 5004.24 RGD1 Striped Bass 2014-10-02 00:33:12 2014-10-02 01:10:21
# Convert your intervals to a table, then correct them to make them positive time intevals
Interval.table <- tibble(Intervals, "ID" = seq(from = 1, to = length(Intervals))) %>%
mutate(end_time = Intervals$start,
start_time = end_time+Intervals$.Data,
new_Interval = start_time %--% end_time) %>%
select(ID, new_Interval)
Interval.table
# A tibble: 7 x 2
ID new_Interval
<int> <Interval>
1 1 2014-10-01 10:00:00 UTC--2014-10-02 08:31:00 UTC
2 2 2014-10-02 10:29:00 UTC--2014-10-03 19:00:00 UTC
3 3 2014-10-03 22:30:00 UTC--2014-10-04 15:00:00 UTC
4 4 2014-10-04 16:00:00 UTC--2014-10-04 20:30:00 UTC
5 5 2014-10-04 23:45:00 UTC--2014-10-05 21:03:00 UTC
6 6 2014-10-05 23:20:00 UTC--2014-10-06 13:30:00 UTC
7 7 2014-10-06 15:28:00 UTC--2014-10-07 14:00:00 UTC
y <- c()
for (i in 1:length(Detections.use$StartDateTime_UTC)){
z<- which(Detections.use$StartDateTime_UTC[i] %within% Interval.table$new_Interval)
y[i] <- ifelse(isTRUE(z>0), z, NA)
}
Detections.use$Interval <- y
Detections.use
# A tibble: 13 x 6
Tag site species StartDateTime_UTC EndDateTime_UTC Interval
<fct> <fct> <chr> <dttm> <dttm> <int>
1 5004.24 IC1 Striped Bass 2014-09-29 22:40:40 2014-09-29 22:46:35 NA
2 5004.24 RGD1 Striped Bass 2014-10-02 00:15:47 2014-10-02 00:30:08 1
3 5004.24 IC1 Striped Bass 2014-09-29 22:49:15 2014-09-29 22:50:05 NA
4 5004.24 RGD1 Striped Bass 2014-10-04 18:38:40 2014-10-04 18:52:30 4
5 5004.24 RGD1 Striped Bass 2014-10-05 07:15:43 2014-10-05 08:05:47 5
6 5004.24 RGD1 Striped Bass 2014-10-03 02:21:36 2014-10-03 02:24:01 2
7 5004.24 RGD1 Striped Bass 2014-10-03 12:13:46 2014-10-03 12:18:02 2
8 5004.24 RGD1 Striped Bass 2014-10-03 09:30:15 2014-10-03 10:14:19 2
9 5004.24 RGD1 Striped Bass 2014-10-04 14:09:39 2014-10-04 14:09:45 3
10 5004.24 RGD1 Striped Bass 2014-10-03 02:17:05 2014-10-03 02:18:05 2
11 5004.24 RGD1 Striped Bass 2014-10-05 03:08:26 2014-10-05 03:12:31 5
12 5004.24 RGD1 Striped Bass 2014-10-03 01:42:02 2014-10-03 01:58:18 2
13 5004.24 RGD1 Striped Bass 2014-10-02 00:33:12 2014-10-02 01:10:21 1
which.join <- function(x, y) {
z <- which(x %within% y)
z <- ifelse(isTRUE(z > 0), z, NA)
z
}
Detections.use$Interval <- sapply(Detections.use$StartDateTime_UTC,
function(x) which.join(x,Interval.table$new_Interval)