基于R中时间戳的行为状态持续时间计算

基于R中时间戳的行为状态持续时间计算,r,dplyr,R,Dplyr,我有一个关于动物行为的数据集,我试图计算一只动物在这里的几个不同“状态”中所花费的时间百分比,在描述接近伴侣的角色变量观察中表示:手臂够不到、手臂够不到或接触到。每个观察期为一小时;同一时期的观测值通过会话和焦点开始时间戳来划分。列behavior_timeStamp提供了动物进入不同“状态”时的时间戳标记,即动物向伴侣移动或离开伴侣。以下是前20行的外观: structure(list(focal_start_timeStamp = c("2019-02-25 10:23:06", "201

我有一个关于动物行为的数据集,我试图计算一只动物在这里的几个不同“状态”中所花费的时间百分比,在描述接近伴侣的角色变量观察中表示:手臂够不到、手臂够不到或接触到。每个观察期为一小时;同一时期的观测值通过会话和焦点开始时间戳来划分。列behavior_timeStamp提供了动物进入不同“状态”时的时间戳标记,即动物向伴侣移动或离开伴侣。以下是前20行的外观:

structure(list(focal_start_timeStamp = c("2019-02-25 10:23:06", 
"2019-02-25 10:23:06", "2019-02-25 10:23:06", "2019-02-25 10:23:06", 
"2019-02-25 10:23:06", "2019-02-25 10:23:06", "2019-02-25 10:23:06", 
"2019-02-25 10:23:06", "2019-02-25 10:23:06", "2019-02-25 10:23:06", 
"2019-02-25 10:23:06", "2019-02-25 10:23:06", "2019-02-25 10:23:06", 
"2019-02-25 10:23:06", "2019-02-25 10:23:06", "2019-02-25 10:23:06", 
"2019-02-26 10:26:43", "2019-02-26 10:26:43", "2019-02-26 10:26:43", 
"2019-02-26 10:26:43"), session = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), actor = c("SIE", 
"BER", "SIE", "SIE", "SIE", "SIE", "BER", "SIE", "BER", "SIE", 
"SIE", "SIE", "SIE", "BER", "SIE", "SIE", "BER", "SIE", "SIE", 
"BER"), behavior_timeStamp = c("2019-02-25 10:23:28", "2019-02-25 10:25:19", 
"2019-02-25 10:35:52", "2019-02-25 10:36:04", "2019-02-25 10:38:12", 
"2019-02-25 10:39:32", "2019-02-25 10:39:48", "2019-02-25 10:58:34", 
"2019-02-25 10:58:48", "2019-02-25 10:58:52", "2019-02-25 10:59:28", 
"2019-02-25 11:00:18", "2019-02-25 11:00:27", "2019-02-25 11:01:00", 
"2019-02-25 11:01:40", "2019-02-25 11:02:13", "2019-02-26 10:27:37", 
"2019-02-26 10:29:06", "2019-02-26 10:29:12", "2019-02-26 10:29:28"
), Observation = c("Proximity_Approach to contact", "Proximity_Withdraw to out of arm`s reach", 
"Proximity_Approach to arm`s reach", "Proximity_Approach to contact", 
"Proximity_Withdraw to out of arm`s reach", "Proximity_Approach to arm`s reach", 
"Proximity_Withdraw to out of arm`s reach", "Proximity_Approach to arm`s reach", 
"Proximity_Withdraw to out of arm`s reach", "Proximity_Approach to arm`s reach", 
"Proximity_Withdraw to out of arm`s reach", "Proximity_Approach to arm`s reach", 
"Proximity_Approach to contact", "Proximity_Withdraw to out of arm`s reach", 
"Proximity_Approach to arm`s reach", "Proximity_Approach to contact", 
"Proximity_Approach to arm`s reach", "Proximity_Approach to contact", 
"Proximity_Withdraw to arm`s reach", "Proximity_Withdraw to arm`s reach"
)), row.names = c(NA, 20L), class = "data.frame")
最后,我想用一个汇总表来说明在三种不同的接近状态下,每一个会话所花费的时间:例如,在给定的会话中,动物与伴侣接触的时间为5分钟,手臂够得着的时间为20分钟,手臂够不到的时间为35分钟


我发现了一些关于根据另一个变量变化级别计算状态持续时间的其他问题,但这些其他解决方案没有太大帮助,因为它们利用了一个数字变量,并使用了像cumsum这样的命令

以下是一种方法:

library(lubridate)

df$behavior_timeStamp <- ymd_hms(df$behavior_timeStamp)
df$focal_start_timeStamp <- ymd_hms(df$focal_start_timeStamp)

# get minutes
df$diff <- c(NA, round(diff(df$behavior_timeStamp) /60, 2))

df %>% 
  group_by(Observation) %>% 
  mutate(sumtime = sum(diff)) %>% 
  replace_na(list(sumtime = 0)) %>% 
  select(Observation, diff, sumtime)

如果我理解正确,有四种类型的观察

unique(DF$Observation)
但OP要求总结三个不同的州

每个会话在三个不同位置中的每个位置所花费的时间 状态:例如,在给定的会话中,动物在 与他们的伴侣联系,20分钟在手臂可及范围内,35分钟在外 伸手可及

此任务可分3个步骤执行:

计算会话中每个观察的持续时间,包括强制 从字符到POSIXct的行为\u时间戳, 通过观察得出接近状态, 创建按会话划分的状态持续时间摘要表 下面是一个使用我的首选工具的实现,请参见下面的dplyr/tidyr版本:

library(data.table)
library(stringr)
setDT(DF)[
  , .(duration = c(diff(lubridate::as_datetime(behavior_timeStamp)), 0), Observation), 
  by = .(focal_start_timeStamp, session)][
    , dcast(.SD, focal_start_timeStamp + session ~ str_remove(Observation, "^Proxi.+?to "), 
            sum, value.var = "duration")]
虽然没有明确要求,但结果以宽格式显示,每个会话一行

下面是一个按照标签要求使用dplyr/tidyr的实现:

library(dplyr)
library(tidyr)
DF %>% 
  group_by(focal_start_timeStamp, session) %>% 
  mutate(duration = c(diff(lubridate::as_datetime(behavior_timeStamp)), 0)) %>% 
  group_by(add = TRUE, proximity_state = stringr::str_remove(Observation, "^Proxi.+?to ")) %>% 
  summarise(duration = sum(duration)) %>% 
  pivot_wider(names_from = proximity_state, values_from = duration)

嗨,约洛,对不起,我应该说得更清楚些。行为_timeStamp列包含所有持续时间信息;焦点开始时间戳只是分隔不同的观察周期。以我的前几行数据为例:我感兴趣的是一种方法来计算第一个1:51的时间是在上午10:23:28到10:25:19之间。然后,我们的目标是对这三种可能的状态中的每一种都这样做!不过,持续时间的计算仍有点偏差。会话中的最终接近状态持续时间不计算在内,因为它只是持续到焦点开始时间戳后一小时会话结束,并且没有标记的结束时间。例如,会话1的持续时间应为120秒、1525秒和1933秒。我可以进入并手动添加表示会话结束的行,以使您的脚本按原样工作,但是否有更精简的方法来执行此操作?@greben,我已编辑了我的答案,以正确计算上一个状态的持续时间。谢谢你提供了预期的结果。这太棒了。非常感谢@Uwe!
   focal_start_timeStamp session arm`s reach  contact out of arm`s reach
1:   2019-02-25 10:23:06       1    120 secs 272 secs          1933 secs
2:   2019-02-26 10:26:43       2    105 secs   6 secs             0 secs
library(dplyr)
library(tidyr)
DF %>% 
  group_by(focal_start_timeStamp, session) %>% 
  mutate(duration = c(diff(lubridate::as_datetime(behavior_timeStamp)), 0)) %>% 
  group_by(add = TRUE, proximity_state = stringr::str_remove(Observation, "^Proxi.+?to ")) %>% 
  summarise(duration = sum(duration)) %>% 
  pivot_wider(names_from = proximity_state, values_from = duration)
# A tibble: 2 x 5
# Groups:   focal_start_timeStamp, session [2]
  focal_start_timeStamp session `arm\`s reach` contact  `out of arm\`s reach`
  <chr>                   <int> <drtn>         <drtn>   <drtn>               
1 2019-02-25 10:23:06         1 120 secs       272 secs 1933 secs            
2 2019-02-26 10:26:43         2 105 secs         6 secs   NA secs
library(data.table)
library(stringr)
library(lubridate)
setDT(DF)[
  , .(duration = diff(c(as_datetime(behavior_timeStamp), 
                        as_datetime(focal_start_timeStamp) + hours(1))), 
      Observation), by = .(focal_start_timeStamp, session)][
    , dcast(.SD, focal_start_timeStamp + session ~ str_remove(Observation, "^Proxi.+?to "), 
            sum, value.var = "duration")]
   focal_start_timeStamp session arm`s reach   contact out of arm`s reach
1:   2019-02-25 10:23:06       1    120 secs 1525 secs          1933 secs
2:   2019-02-26 10:26:43       2   3540 secs    6 secs             0 secs
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)
DF %>% 
  group_by(focal_start_timeStamp, session) %>% 
  mutate(duration = diff(c(as_datetime(behavior_timeStamp), 
                           as_datetime(focal_start_timeStamp[1L]) + hours(1)))) %>% 
  group_by(add = TRUE, proximity_state = str_remove(Observation, "^Proxi.+?to ")) %>% 
  summarise(duration = sum(duration)) %>% 
  pivot_wider(names_from = proximity_state, values_from = duration)
# A tibble: 2 x 5
# Groups:   focal_start_timeStamp, session [2]
  focal_start_timeStamp session `arm\`s reach` contact   `out of arm\`s reach`
  <chr>                   <int> <drtn>         <drtn>    <drtn>               
1 2019-02-25 10:23:06         1  120 secs      1525 secs 1933 secs            
2 2019-02-26 10:26:43         2 3540 secs         6 secs   NA secs