在分组的r数据帧中查找分组持续时间_R_Dataframe

在分组的r数据帧中查找分组持续时间

r dataframe

在分组的r数据帧中查找分组持续时间,r,dataframe,R,Dataframe,我有一个像这样的R数据帧 timestamp Value ref 19-07-2019 02:46 7 1 19-07-2019 02:47 5 1 19-07-2019 02:48 2 1 19-07-2019 02:49 4 1 19-07-2019 02:50 7 1 19-07-2019 02:51 0 1 19-07-2019 02:52 3 1 19-07-2019 02:53 3 1 1

我有一个像这样的R数据帧

timestamp         Value ref
19-07-2019  02:46   7   1
19-07-2019  02:47   5   1
19-07-2019  02:48   2   1
19-07-2019  02:49   4   1
19-07-2019  02:50   7   1
19-07-2019  02:51   0   1
19-07-2019  02:52   3   1
19-07-2019  02:53   3   1
19-07-2019  02:54   10  1
19-07-2019  02:55   1   0
19-07-2019  02:56   3   0
19-07-2019  02:57   10  2
19-07-2019  02:58   7   3
19-07-2019  02:59   0   3
19-07-2019  03:00   9   3
19-07-2019  03:01   7   3
19-07-2019  03:02   10  3
19-07-2019  03:03   7   4
19-07-2019  03:04   10  4
19-07-2019  03:05   0   0

我想找出除0以外的每个组中第一个和最后一个时间戳之间的差异。因此，第1组的开始时间为：19-07-2019 02:46结束时间为19-07-2019 02:54

输出格式是具有三列的数据帧：持续时间开始值结束值

其中duration是时差，start_值是该组中的第一个值，end_值是该组中的最后一个值从这个示例中，输出将有4行，因为我们有4个组，而不是0。将时间戳转换为POSIXct，并获取每个组中的最大值、最小值和它们之间的差值

library(dplyr)

df %>%
  mutate(timestamp = as.POSIXct(timestamp, format = "%d-%m-%Y %H:%M")) %>%
  group_by(ref) %>%
  summarise(start_value = min(timestamp), 
            end_value = max(timestamp), 
            duration = end_value - start_value)

# A tibble: 5 x 4
#    ref start_value         end_value           duration
#  <int> <dttm>              <dttm>              <drtn>  
#1     0 2019-07-19 02:55:00 2019-07-19 03:05:00 10 mins 
#2     1 2019-07-19 02:46:00 2019-07-19 02:54:00  8 mins 
#3     2 2019-07-19 02:57:00 2019-07-19 02:57:00  0 mins 
#4     3 2019-07-19 02:58:00 2019-07-19 03:02:00  4 mins 
#5     4 2019-07-19 03:03:00 2019-07-19 03:04:00  1 mins

资料

列时间戳属于不受支持的类POSIXlt。我收到此错误。我将其转换为as.numeric。然后它开始工作。如何在此列格式中组织开始时间结束时间最大值最小值duration@VictorJohnzon您确定使用的是as.POSIXct而不是as.POSIXlt吗？这对我来说似乎没有任何错误。还有你的另一个问题，我已经更新了答案。它现在正在工作。对不起，我弄错了。我使用的是as.POSIXct，而不是as.POSIXlt。另外，我想得到的是max和min时间戳时的值，而不是中的max和in值thegroup@VictorJohnzon我更新了答案以删除0个组，并在最后添加了%>%data.frame以删除这些额外的内容。

df %>%
  filter(ref != 0) %>%
  mutate(timestamp = as.POSIXct(timestamp, format = "%d-%m-%Y %H:%M")) %>%
  group_by(ref) %>%
  summarise(start_time = min(timestamp), 
            end_time = max(timestamp), 
            max_value = max(Value), 
            min_value = min(Value), 
            duration = end_time - start_time) %>%
   data.frame()

df <- structure(list(timestamp = structure(1:20, .Label = c("19-07-201902:46", 
"19-07-201902:47", "19-07-201902:48", "19-07-201902:49", "19-07-201902:50", 
"19-07-201902:51", "19-07-201902:52", "19-07-201902:53", "19-07-201902:54", 
"19-07-201902:55", "19-07-201902:56", "19-07-201902:57", "19-07-201902:58", 
"19-07-201902:59", "19-07-201903:00", "19-07-201903:01", "19-07-201903:02", 
"19-07-201903:03", "19-07-201903:04", "19-07-201903:05"), class = "factor"), 
Value = c(7L, 5L, 2L, 4L, 7L, 0L, 3L, 3L, 10L, 1L, 3L, 10L, 
7L, 0L, 9L, 7L, 10L, 7L, 10L, 0L), ref = c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 0L, 0L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
0L)), class = "data.frame", row.names = c(NA, -20L))